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STATISTICAL FRONTIERS* 


M. Cox 
Institute of Statistics, University of North Carolina 


1, INTRODUCTION 


AM going to ask you to look forward as we try to discern, as best we can, 

what the future holds for statisticians. If ten years ago we had predicted 
some of the things we are doing today, we would have been ridiculed. Now, my 
concern is that we may become too conservative in our thinking. 

Civilization is not threatened by atomic or hydrogen bombs; it is threatened 
by ourselves. We are surrounded with ever widening horizons of thought, 
which demand that we find better ways of analytical thinking. We must 
recognize that the observer is part of what he observes and that the thinker is 
part of what he thinks. We cannot passively observe the statistical universe as 
outsiders, for we are all in it. 

The statistical horizon looks bright. Exciting experiences lie ahead for those 
young statisticians whose minds are equipped with knowledge and who have 
the capacity to think constructively, imaginatively, and accurately. 

Will you, with me, look upon the statistical universe as containing three 
major continents: (1) descriptive methods, (2) design of experiments and in- 
vestigations, and (3) analysis and theory. As we tour these continents, we shall 
visit briefly a few selected well developed countries, where statisticians have 
spent considerable time. As tourists, we shall have to stop sometimes to com- 
ment on the scenery, culture, politics, or the difficulties encountered in securing 
a visa. With our scientific backgrounds, we should spend most of our time 
seeking out the new, the underdeveloped, the unexplored or even the danger- 
ous areas. 

It is one of the challenges of the statistical universe that, as new regions are 
discovered and developed, the horizon moves further away. We cannot visit all 
the frontiers for they are too numerous. I believe that we should try to visualize 
the challenges of the future by looking at typical types of unsolved problems. 
I hope you will find the trip so interesting that you will revisit some of these 
statistical frontiers not as tourists but as explorers. 

You know how many folders and guide books one can accumulate while 
traveling. I am not going even to list the ones used. This will leave you guessing 
whether I am quoting or using original ideas. Many people in this audience 


* Presidential address, at the 116th Annual Meeting of the American Statistical Association, Detroit, Michigan, 
September 9, 1956. 
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will recognize their statements used with no indication that they are quota- 
tions. 


2. DESCRIPTIVE METHODS CONTINENT 


In planning our tour, I decided to take you first to the descriptive methods 
continent, for it is the oldest and has the densest settlement. The lay concep- 
tion of descriptive methods ordinarily includes these countries: (1) collection 
of data; (2) summarization of data including such states as tabulation, meas- 
ures of central tendency and dispersion, index numbers and the description 
of time series; and (3) the presentation of data in textual, tabular, and graphic 
form. 

The collection of data is the largest country on this descriptive methods 
continent. This country is of common interest. and concern to the whole 
statistical universe and is by far the oldest country. Official statistics existed 
in the classic and medieval world. In fact, in 1500 B.C. in Judea the population 
is given as 100,000 souls. Practical necessity forced the earliest rulers to have 
some count of the number of people in their kingdom. 

The collection of official statistics has increased in importance over the 
years as evidenced by the large units of our Federal Government such as 
Census, Agriculture, and Labor, organized to collect all kinds of useful data. 

Before going into the frontier area to collect more data, one should check 
carefully the sources of data in the settled areas to be sure that he is not about 
to perform needless duplication. The decision will have to be made whether 
to take a census, or to take a sample from the population. Here, as we stand on 
a ridge, we look over into the sampling country which we shall visit later. 

Between the collection and the summarization of data countries, there is this 
border area, where the police (editors) check our schedule to make sure the 
blanks are filled and thet no absurd or highly improbable entries have been 
made. As we continue our tour, our papers and passports will be checked 
frequently. 

Our first stop in the summarization country is at the state called tabulation. 
Here the data on all items from the individual schedules are tabulated and 
cross-tabulated. A visit here is prerequisite to all further study of the data 
by statistical methods. 

I shall have to ask you to pass up a visit to the well-known array, ranking, 
and frequency tables states. There still exists disputed area around the fre- 
quency table, such as the choice of the beginning and extent of class intervals. 
These historic frontiers and the political devices such as ratios, proportions 
and percentages are visited by many tourists. 

Let us proceed to two other states, where the calculations of measures of 
central tendency and dispersion are made. The central tendency state has 
several clans. In one, the arithmetic mean is the rule. A second group has a 
median rule, and a third group prefers the mode rule. 

Near the mainland, there are islands between it and the analysis and theory 
continent. Even on these islands mathematical definitions are required for the 
rules used for measuring central tendencies such as the geometric and harmonic 
means. 

As we go on into the dispersion state you will note that the topography is 
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becoming less familiar. Yet variation of individuals in a measurable character- 
istic is a basic condition for statistical analysis and theory. If uniformity 
prevailed, there would be no need for statistical methods, though descriptive 
methods might be desired. 

This variation state also has several clans. One advocates the range as the 
simplest measure to describe the dispersion of a distribution. Another prefers 
the use of the mean deviation, while the most densely populated clan advocates 
the standard deviation. Nearby is a frontier area where dwell less familiar and 
relatively uninteresting groups suck as the quartile deviation and the 10-90 
percentile range. 

In this descriptive methods continent, placed in the summarization of data 
country are other states settled by special purpose groups. Let us now visit 
two, the index number and the description of time series states, to look at some 
of their unsettled and disputed frontier problems. 

The index number state, consisting of one or a set of measures for one or a 
group of units, evaluates indirectly the incidence of a characteristic that is not 
directly measurable. We do not have time to visit the single factor index area, 
but proceed directly to the wide open frontiers of multi-factor indexes. For 
example, the price and level-vf-living indexes are well known and of vital 
interest. On this frontier: (1) Which multi-factor index formula is the best? 
(2) What items should be included? (3) What is the proper weighting of items? 
(4) Is the fixed base or chain method best? (5) How frequently should the base 
be changed? (6) When and how can you remove obsolete commodities and add 
new ones into the index? and (7) If the index number has no counterpart in 
reality, should it be discarded? To settle these frontiers, developments are 
needed on the borders with the theory continent. 

In the description of time series state, we find measures recorded on some 
characteristic of a unit (or a group of units) for different periods or points of 
time. There are several method groups governing this state such as inspection, 
semi-averages, moving averages and least squares. Of course, there are disputes 
about which method is best. One of the frontier problems is how to handle 
nonlinear trends. One group of statisticians exploring in this state deals with 
time series accounting for secular trend, cyclical, periodic, and irregular move- 
ments. 

Note that most of the folks in this area are economists. The public health 
and industrial scientists are beginning to explore here. They have such problems 
as fatigue testing, incubation period of a disease, and the life time of radio- 
active substances. 

This is rather an exhausting tour, so much to be seen in so short a time. 
However, before you leave the descriptive methods continent, I want you to 
visit the presentation of results country. The availability and usefulness of 
whatever contribution to scientific knowledge the project has yielded are 
dependent upon the successful performance in this country. 

As we enter the presentation of results country, you will be asked to swear 
allegiance to logical organiz® ‘ion, preciseness, and ease of comprehension. In 
this country, certain conventions in structure and style of the form of presenta- 
tion have developed and are generally accepted. 

The methods of presentation of resu!ts divide into several states: textual, 
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tabular, and graphic. The textual state gives only statements of findings and 
interpretation of results. The tabular state has two types of tables, the general 
and the special purpose tables, according to their functions. In the graphic 
state, presentation of quantitative data in represented by geometric designs. 
It is obvious that the tourist naive in mathematics will enjoy this state. Some 
of the named community settlements are: the bar diagram, area diagram, 
coordinate chart, statistical map, and pictorial statistics. 


3. DESIGN OF BXPERIMENTS AND INVESTIGATIONS CONTINENT 


Later in discovery and development was the analytical statistics hemisphere 
where the tools and techniques for research workers are provided and used. 
The northern continent, called Design of Experiments and Investigations, is 
divided into two major sections, the design of experiments and the design of 
investigations. 

My own random walks have taken me into the design of experiment section 
of this continent more frequently and extensively than into any other area we 
shall visit. 

This section is divided into four major countries: (1) completely randomized, 
(2) randomized block, (3) latin square, and (4) incomplete block designs. The 
first three countries are the oldest and are well developed. However, in the 
latin square country, let us visit a newly explored state, where the latin square 
is adjusted so as to measure residual effects which may be present when the 
treatments are applied in sequence. 

We might inquire about the uprisings in the latin square country when non- 
random treatments are assigned to the rows and columns. This takes you over 
into the incomplete block design country. It is hoped that this area will be 
placed in the incomplete block design country without further trouble. 

The selection of the treatment combinations to go into these countries takes 
us into another dimension of this statistical universe. We have single factor 
and multiple factor treatment combinations. Small factorial groups fit nicely 
into our design countries. If several factors are involved, we may need to 
introduce confounding. This requires settlement in the incomplete block design 
country, where there are more blocks than replications. Some confounded areas 
are settled, such as those where confounding on a main effect, the split-plot 
design country. Here you find political parties with platforms ranging from 
randomization to stripping the second factor. This latter complicates its trade 
relations with the analysis countries. 

Let us continue in the incomplete block design country and cross the state 
where confounding on high-order interactions is practiced. Right near, and 
often overlapping, is a new state using confounding on degree effects. These two 
states are being settled, with good roads already constructed, but the border 
has not been defined or peacefully occupied. 

A rather new and progressive group of settlers are the fractional replication 
folks. Their chief platform is that five or more factors can be included simulta- 
neously in an experiment of a practicable size so that the investigator can dis- 
cover quickly which factors have an important effect on their product. In this 
area the hazard of misinterpretation is especially dangerous when one is not 
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sure of the aliases. The penalties may be trivial. However, it seems wise not to 
join this group unless you know enough about the nature of the factor inter- 
actions. 

The balanced and partially balanced incomplete block states are being 
settled very rapidly. So far as experimental operations are concerned, the in- 
complete block design country is no more difficult to settle than the complete 
block design country. It will take some extra planning and analysis to live in 
the incomplete block country and you will have to have adjusted means. The 
weights to use to adjust the means are still in a frontier status. 

There are numerous frontier areas in this incomplete block country where 
roads and communications have been established. There are 376 partially 
balanced incomplete block design lots with k>2 and 92 lots with k=2 from 
which to choose. These lots have two associate classes. 

We should look at some of the newer settlements as (1) the chain block and 
the generalized chain block design states; (2) the doubly-balanced incomplete 
block design state where account can be taken of the correlation between 
experimental units; and (3) the paired comparison design areas for testing 
concordance between judges, together with the appropriate agreements with 
the analysis continent. Beyond the latin square country dikes have been built 
to provide new land. There are latin squares with a row and column added or 
omitted, or with a column added and a row omitted. Further work covering 
more general situations will give this design continent more areas for expan- 
sion. 


_ Let us go now to another large new country which, after negotiations, has 
been established by taking sections of the design and analysis continents. The 
process has raised some political issues and questions of international control. 
The development came about because, in the design continent, there is a two- 
party system with data measured (1) on a continuous scale (quantitative 
variable) or (2) on a discontinuous scale (qualitative variable). These party 
members have settled side by side in the design continent for single-factor 


groups. 

If we have factorial groups, we have to consider both whether the measures 
are continuous or discontinuous and whether the factors are independent or 
not. To handle these problems, some of the continuous scale statisticians have 
established a response surface country. To prepare for the peaceful settlement 
of this response surface country a portion of the regression analysis state has 
been transferred. Whether this separation of portions of countries to make up 
a new country will hold, only time will tell. 

Here in this rather new response surface country, observe that major interest 
lies in quantitative variables, measured on a continuous scale. In this situa- 
tion, it is often natural to think of response as related to the levels of the factors 
by some mathematical function. The new methods are applicable when the 
function can be approximated, within the limits of the experimental region, 
by a polynomial. 

In this tropical and exciting response surface country, the central composite 
and non-central composite states have been settled for some time. Some of the 
other borders are not firmly fixed, as would be expected in a new country. New 
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states identified as first, second, third, and higher-order designs are seeking 
admittance to this country. They overlap with some of the older countries. 
We can stand over here on this mountain top and see many frontiers as the 
very special central composite rotatable design area, which has been named 
and partially settled with some roads constructed. Over there is the evalua- 
tion frontier where the relative efficiency of these designs and methods needs 
to be determined. 

Progress has been made on strategies to be used for determining the optimum 
combination of factor levels. In addition to locating the maximum of y, it is 
often desirable to know something about how y varies when the factor levels are 
changed from their optimum values. The efficient location of an optimum 
combination of factor levels often requires a planned sequential series of 
experiments. 

Most experimentation is sequential, since the treatments are applied to the 
experimental units in some definite time sequence. To explore in this area, the 
process of measurement must be rapid so that the response on any unit is 
known before the experimenter treats the next unit. A method of sequential 
analysis gives rules that determine, after any number of observations, whether 
to stop or continue the experiment. 

The full sequential approach is often not practical, thus the two or multiple 
stage sequential plan with groups of units handled at one time takes us into 
the frontiers of this region. So far, the matter of testing hypotheses has been 
given major attention, but now sequential methods hold promise of increasing 
the efficiency of both testing and estimation procedures. 

Are you ready now to visit the investigations (more popularly known as 
sampling) section of this design continent? Since this section borders on the 
descriptive methods continent, both continents find that it is essential to 
maintain trade relationships. 

In all fields of experimentation and in most collections of descriptive data 
only a sample from the population can be considered. How to do this efficiently 
presents an extensive horizon. 

I hope you did not forget to get a visa permit to travel into the sample 
design territory. We shall quickly cross the settled simple random sampling 
country. Here is the method of sampling in which the members of the sample 
are drawn independently with equal prol «bilities. This is a satisfactory place 
to settle if the population is not highly variable. On the frontier between this 
country and the other countries of this area, there are two problems: (1) How 
could the present sampling procedures be improved if the observations followed 
a standard distribution form? (2) What are the effects of nonrandomness? The 
inhabitants of these frontiers invade the settled areas frequently, and frontier 
battles result. 

Next, we must cross the systematic sampling country. It is very difficult to 
secure permission from a statistician to enter this country. However, it is 
densely settled mostly by older people who have lived here all their lives. We 
frequently hear about uprisings and renewed efforts of this group to acquire 
all the advantages of the simple random sampling country. 

It appears that settlement in the systematic sampling country can safely 
be recommended if one of the following conditions exists, (1) the order of the 
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population is essentially random, or (2) several strata are to be used, with an 
independent systematic sample drawn from each stratum. There may be 
populations for which systematic sampling gives extremely precise estimates 
of the mean but never gives reliable estimates of the variance of the mean. 

Perhaps the most popular section of the sampling area is the stratified random — 
sampling country. The population is divided into parts called strata, then a 
sample is drawn independently in each part. One popular political party selects 
the number of units per stratum by optimum allocation. The second party 
advocates selection of a proportionate number of units per stratum. Some 
recently explored frontier areas are: (1) the determination of optimum alloca- 
tion in multivariate studies, (2) the improvement of criteria for the construc- 
tion of strata, and (3) the selection of the optimum number of strata. 

If you are interested in large sample surveys, you will want to visit the 
multi-stage sampling country. Here the first stage units may be selected with 
probability proportional to size, the second stage units with equal probability. 
An adjacent area has been explored where first stage units are selected with 
arbitrary probability. 

In newer areas of the multi-stage sampling country more than one first stage 
unit per stratum is drawn in order to permit internal assessment oi the sam- 
pling errors of estimates. Even here many of these large surveys have *- an 
relegated to the archives without securing the sampling errors of estimates. 
This is done perhaps because of the complexity of the estimating formulas. 
Electronic computing machines are helping to settle this difficulty. In fact, 
the machines may open up even wider frontiers for settlement in the sample 
design countries. 

In all the sampling territory, there are many internal political and economic 
frontiers to be cleared. These sampling countries now have fair control over 
sampling errors but relatively little over non-sampling errors. They realize the 
need to find an economic balance between investment on sample and invest- 
ment on measurement technique. To these developing frontiers, we can add 
others such as: (1) What are the relative efficiencies of the various sampling 
plans? (2) What is the effect of nonresponse? and (2) What is an efficient 
method to sample for scarce items? Efforts are being made to clear out the 
underbrush and to settle some of this frontier area around the sampling 
territory. 


4. STATISTICAL INFERENCE; ANALYSIS AND THEORY CONTINENT 


In the analytical statistics hemisphere, we have visited the northern design 
of experiments and investigations continent. Let us start our tour of the 
southern statistical inference or the analysis and theoi,’ continent. The broad 
problem of statistical inference is to provide measures of the uncertainty of 
conclusions drawn from experimental data. All this territory, in the statistical 
universe, has been discovered and settled by a process of generalizing from 
particular results. 

Let us visit several analytical technique countries, keeping in mind that the 
level of civilization in each of these countries is determined largely by the 
status of its theoretical development. 

First, here is the beautiful and popuiar ¢-test country, where testing of hy- 
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potheses and setting up «i confidence intervals for univariate populations are 
performed. This area is a tourist photographic paradise, but we cannot tarry. 
I know you will return. 

Hurriedly, the way some tourists travel, we shall cross another univariate 
country, analysis of variance. Almost all statisticians, except maybe a few 
theorists, have enjoyed the beautiful lakes and mountains in this country. 
Among the attractive features to explore are the orthogonal sets of single 
degrees of freedom, the separation of simple effects when interaction exists, 
the use of both continuous and discontinuous variables and even the fitting 
of regression models for the fixed continuous variable. This latter region is being 
urged to establish an alliance with the response surface country. 

We have time to observe only a few frontier problems: (1) What is the power 
of analysis of variance to detect the winner? (2) How do you analyze data 
which involve both a quantal and a graded response? (3) How do you attach 
confidence limits to proportions? (4) What about nonhomogeneity of variance 
when making tests of significance? and (5) Should we enter these countries 
with nonnormal data? I may just mention a subversive area, at least it is 
considered so by some, that is, the region where effects suggested by the data 
are tested. 

Are you ready now to visit the correlation country? Bivariate populations 
are often interesting because of the relationship between measurements. First, 
let us visit the well developed product moment correlation section, where the 
cultural level is high due to theoretical verifications. Around here are several 
unincorporated areas, quite heavily populated by special groups, but not too 
well supported by theory. You should be careful if you visit the method of rank 
difference, p (rho), the non-linear, 7 (eta), the biserial or the tetrachoric co- 
efficients of correlation districts. 

While we travel across to the regression country, I might mention that its 
constitution has several articles like the constitution of the correlation country. 
The two are confused by some users of statistics and even by statisticians. 

We had better check to see if you have your visa before we enter the regres- 
sion country. Some of the acceptable reasons for granting visas are: (1) to see 
if Y depends on X and if so, how much, (2) to predict Y from X, (3) to deter- 
mine the shape of the regression line, (4) to find the error involved in experi- 
ments after effect of related factor is discounted or (5) to seek cause and 
effect. 

Some near frontier areas are being settled, such as those where there are 
errors in both the X and the Y variables. Other frontiers include the test of the 
heterogeneity of two or more regressions. How do we average similar ones? 
What about the nonlinear regression lines? 

As we leave the bivariate countries of the analysis and theory continent and 
enter the multivariate countries, we find that life becomes more complicated. 
All kinds of mechanical, electrical and electronic statistical tools have come 
into use. These countries have been developed from, but are not independent of, 
the univariate and bivariate areas by a process of successive generalizations. 
For example, people were taken from the t-test country and by generalization 
they developed the statistics 7’ country. This 7’ group does all the things done 
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by the ¢ group for any number of variates simultaneously, be they mutually 
correlated or independent. 

In this multivariate area, new territory related to the analysis of variance 
has been explored and is called the multivariate analysis of variance. Here are 
theoretical frontiers to be explored. Some are (1) What are the values of the 
roots of a determinantal equation and what particular combination of them 
should be used for a particular purpose? (2) What are the limitations and use- 
fulness of the multivariate analysis of variance country? and (3) What are 
the confidence bounds on parametric functions connected with multivariate 
normal populations? 

The next time you come this way, I wish you would stop to explore the areas 
where the discriminant function and factor analysis methods are used. There 
may be some danger that the latter will not be able to withstand the attacks 
being made by those who advocate replacing factor analysis by other statistical 
_ methods. I personally believe the factor analysis area will resist its attackers 

and will remain in the statistical universe as a powerful country. 

The simple correlation ideas were generalized into two new countries, the 
multiple correlation country and the less well known canonical correlation 
country, which has two sets of variates. 

Crossing the multiple regression country, we look at the frontiers. There are 
situations where it is desirable to combine scored, ranked, and continuous data 
into a multiple regression or factor analysis. How can this be done legally? 
What about the normal distribution assumptions? 

I cannot resist having you visit the analysis of covariance country for it 
accomplishes some of the same purposes as do the design countries. Covariance 
helps to increase accuracy of estimates of means and variances. However, 
dangerous mountains exist in this country. The explorers may need to develop 
added theory to enable the applied statistician to reach the top of such cliffs 
as the one where the X variable is affected by the treatments. If the treatments 
do affect X, a covariance analysis may add information about the way in 
which the treatments produce their effects. The interpretation of the results 
when covariance is used requires care, since an extrapolation danger may be 
involved. Now that I have acknowledged that we are in a dangerous area, I 
might state that the dangers of extrapolation exist in all regression and related 
areas, and especially back in the response surface country. 

We are ready to enter the variance component country, where separate 
sources of variability are identified. Estimates of these variance components 
are desired. These estimates are used to plan future experiments, to make 
tests of significance, and to set confidence limits. 

This country is relatively new, so that adequate statistical theory has not 
been developed, thus leaving rugged frontiers: (1) The assumption of additivity 
needs to be explored in detail, (2) A clear statement is needed of how to decide 
whether the interaction in a two-way classification is nonrandom or random, 
(3) More exact methods of assigning confidence limits for the variance com- 
ponents need to be developed, (4) How does one handle the mixed model? (5) 
How can one detect correlated errors? (6) What can be done to simplify the 
analysis of data with unequal variances? (7) What are the effects of various 
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types of nonnormality on the consistency and efficiency of estimates? and (8) 
Some study needs to be made of the proper allocation of samples in a nested 
sampling problem when resources are limited and good estimates of all com- 
ponents are desired. 

Another section of the variance component country is called components of 
error. The problem of choosing the correct error term in the analysis of two 
or more factors depends upon whether the factors are random or nonrandom 
or upon the question you ask. Do you want the mean difference between treat- 
ments averaged over these particular areas with narrow confidence limits, or 
do you want mean differences averaged over a population of areas of which 
these areas are a sample with broad confidence limits? 

So far, we have visited almost exclusively the parametric inference countries. 
Let us take a glimpse at the frontier in the nonparametric inference territory. 
When the experimenter does not know the form of his population distribution, 
or knows that it is not normal, then he may either transform his data or use 
methods of analysis called distribution free or nonparametric methods. This 
territory is being settled. The area dealing with the efficiency of certain tests 
for two by two tables has been partially settled and some general theorems on 
the asymptotic efficiency of tests have been proved. 

Some of the frontiers are: (1) What is the general theory of power functions 
for distribution free tests? (2) What is the efficiency of nonparametric tests? 
(3) Can sequential methods be applied to nonparametric problems, and (4) 
How can two nonnormal populations be compared? 

There are three more general frontiers I wish to mention. (1) How far are we 
justified in using statistical methods based on probability theory for the 
analysis of nonexperimental data? Much of the data used in the descriptive 
methods continent are observational or nonexperimental records. (2) What are 
the effects of nonnormality, heterogeneity, nonrandomness and nonindepend- 
ence of observations to which standard statistical methods are applied? And 
(3) How can we deal with truncated populations in relationship problems? 

As we complete our tour of the three continents, I wish to emphasize the 
fact that there are many important problems of design and statistical inference 
which remain unexplored. 


5. TRAINING FRONTIER 


Our travels took us to only a part of the statistical universe, but we managed 
to observe many frontier areas. I hope one thing impressed you: that is, the 
extent of the need for statisticians to explore these areas. In recent years, 
there have been advances in statistical theory and technology, but the prompt 
application of these to our biological, social, physical, industrial, and national 
defense needs has created an unprecedented demand for intelligent and 
highly trained statisticians. Research workers in many fields are requesting 
the statistician to help both in planning experiments or surveys and in drawing 
conclusions from the data. Administrators are facing the quantitative aspects 
of problems, such as optimum inventories, production schedules, sales efforts, 
pricing policies and business expansion, which call for new mathematical 
methods for solving problems concerned with decision making. 
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It is certain that statisticians are destined for a larger role in the present 
and future ages. What are we doing about training? More universities are 
considering statistics as an integral part of the curriculum. They need to 
consider also statistical research as just as vital a part of the educational 
process, for it enhances the professional competence of the teacher and is 
beneficial to the student. The teachers of statistics, especially those who 
handle graduate students, must be research minded. Banal as it may sound, 
the teachers of statistics should be trained in statistics. How many are? It is 
known that high school and college students are taking mathematics courses 
under teachers not trained in mathematics. The percentage of qualified high 
school teachers of science and mathematics has fallen about 53% in the past 
five years. 

The decisions that produce our scientists are often made before the student 
arrives on the college campus. We cannot supply the statisticians we need if 
. students do not come to college and if they do not possess adequate high 
school preparation in mathematics and science. We need to learn how to 
educate children to think scientifically, how to select those best fitted for 
scientific research and how to train them. We must go into the recruiting and 
training frontiers. More consideration will need to be given to strengthening 
high school education in science and mathematics. We need to find ways to 
direct the ambitions of boys and girls. What we are facing is not a shortage of 
ability and talent, but a shortage of trained talent at all levels. To produce 
the supply of qualified students needed, the parents, teachers, guidance 
personnel, and public officials must acquire a full understanding of the statis- 
tical education problem. 

In 1955 there were listed 59 Ph.D. degrees,given in statistics in the United 
States and Canada with only 20 universitiés contributing. Here is a wide- 
open frontier. For every high school graduate who eventually earns a doctoral 
degree there are twenty-five others who have the intellectual ability to achieve 
that degree, but do not. The supply seems sufficient. What are we doing to tap 
this supply? Statistics is one of the sciences which is young and inexperienced. 
What the statistical universe is like tomorrow depends on the frontiers we 
open up today and the wisdom we use in planning for the occupation of this 
new territory. 

There are other wide open frontiers. (1) The area related to the revision and 
improvement of course content and teaching methods. We have gone a short 
distance into this region. (2) In the U. 8. our people have profound faith in 
education, but they do not have equally profound faith in our educators. (3) 
Whether we like it or not, in the culture which exists in our country today, 
the desirability of a given profession is measured largely in terms of salary. 


6. CONCLUSIONS 


As you statisticians continue your work, you will visit some of these frontiers. 
I hope next time you will go not as tourists but as explorers. 

“Statistician” is a mere word denoting a certain aspect in a human being. 
The fact that you, as an individual, are classified as a “statistician” does not 
free you from obligations and responsibilities toward other human beings. 
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Some of you may feel that if you work hard and produce three published 
articles per year, then you are not to be held responsible for the consequences 
of your work. However, your food, shelter, clothing, physical and even social 
comforts and pleasures are provided by others. 

| You have an obligation to clarify the foundations of your techniques and 
methods for your clients. Our statistical methods should be tailored to the needs 
of the users, even if this calls for approximate methods. I want you young 
statisticians not to become men of success but rather to become men of value. 
Let me give you one quotation from Einstein, ‘“‘He is considered successful in 
our day who gets more out of life than he puts in. But a man of value gives 
more than he receives.” 

I do not decry individual effort, but, in considering the challenge of arid 
lands, we must have teamwork; that teamwork should be between individuals, 
universities, and research groups in this and other countries. 

One thing is certain, we are at the beginning of a new age—an age that will 
be richer and will ia more and more opportunities for people whose minds 
are flexible and who are eager to increase their area of awareness. 

How big is the challenge and how far do these frontier areas extend? My 
reply would be like that of the taxi driver when the judge asked him, “How far 
can you see?” His reply, “The sun.” 
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A SHORT-CUT GRAPHIC METHOD FOR FITTING THE BEST 
STRAIGHT LINE TO A SERIES OF POINTS ACCORDING 
TO THE CRITERION OF LEAST SQUARES* 


8. I. Asxovirz 
University of Pennsylvania 


A simple technique is presented for obtaining without calculation 
the best fitting line according to the least squares criterion, provided 
the points are equally spaced horizontally. A graphic measure of 
residual variability is also derived. 


INTRODUCTION 


HE straight line usually selected as best representing the trend of a series 

of points on ordinary rectangular graph paper is such that the sum of the 
squares of the vertical deviations of the individual points from the line shall 
be a minimum. Most textbooks on statistics or numerical methods contain the 
“normal” equations for computing the equation of this line in the form y= mz 
+b. However, the numerical calculations may become moderately time-con- 
suming and tedious, and therefore readily subject to error. 

In the case of any time-series or set of data with equally spaced z-values, the 
best fitting line can be located by means of a pencil and straightedge alone, 
without any computing whatsoever. The method to be described, applicable 
to any number of points whether odd or even, consists essentially of advancing 
along two polygonal lines, the endpoints of which determine the required line. 
This technique should prove especially valuable for automatically recorded 
charts, or for published material where the graph is at hand but not the cor- 
responding numerical data. By means of this method it is possible to draw the 
line of best fit directly on such a graph, without having to write down any 
numerical values or perform any calculations. 


METHOD 


The technique may be illustrated by applying it to the six points A, B, 
C, +--+, F, shown in Fig. 1, which might be taken to represent a type of data 
such as price indexes in successive months, the incidence of heart disease per 
decade of life, or annual family income according to number of persons in the 
household. Let s represent the uniform “spacing unit” between consecutive 
z-values. 

Place a ruler or straightedge so that it joins A and B (Fig. 2), and with a 
pencil start at A and draw a straight line in the direction of B, but stopping 
at B’, two-thirds of the distance from A towards B, or on the vertical line 3s 
to the right of A. Holding the pencil point at B’, turn the ruler so that it indi- 

* Revision of part of the paper “Rapid graphical methods in statistics,” presented at the Mathematics Col- 
loquium of the University of Pennsylvania, December 1, 1955. This work was started at the Ophthalmology Re- 
search Laboratory, Albert Einstein Medical Center, Northern Division, Philadelphia, under a research grant from 
the Weinstock Fund, and continued at the Department of Therapeutic Research, School of Medicine, University of 
Pennsylvania, Philadelphia, supported in part by Grant H-625 (C-7) from the National Heart Institute of the 


Natienal Institutes of Health. For suggesting the addition of some measure of residual variability to the original 
manuscript, the author expresses his thanks to the referee. 
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X 


Fie. 1. Series of points with equally spaced z-coordinates. It is desired to find the line 
which will make the sum of the squares of the vertical deviations a minimum. 


>* 


Fic. 2. Graphic method. Proceed from A through B’, C’, D’,---, to T, at each step 
turning toward the next in the original series of points and advancing by $s toward the 
right. Similarly, start at F and work toward the left, through EZ”, D”,---, to U. Then 
UT will be the line of best fit. 


cates a straight line passing through B’ and C, and starting at B’, draw a line 
- in the direction of C, to the point C’, two-thirds of a spacing unit further to the 
right; i.e., C’ lies on a vertical line 3s to the right of B’. ‘Then, starting at C’, aim 
at D and draw a line to D’, two-thirds of a spacing unit to the right of C’. 
This process is continued until the last of the original series of points has been 
included. Let us designate as 7’ the final point arrived at when working toward 
the right. 

Now start at the right end of the same series of points, and proceed in a 
similar manner toward the left, arriving at point U. Join UT. This is the re- 
quired line of best fit for the series of points A to F, according to the criterion 
of least squares. 
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Should it be desired, the arithmetic mean of the original y-values may now 
be graphically determined, merely by taking the y-coordinate of M, the mid- 
point of UT (or the point where UT is intersected by the ordinate half-way 
between the first and last ordinates). 


CHECK 


The accuracy of the graphic steps should be tested by noting first that the 
ordinates through U and 7’ should exactly trisect the space representing the 
extent of the original z’s (Fig. 3). The correctness of M should be checked by 
locating this point independently, according to a rapid graphic method pub- 
lished by the author previously! (or by comparing its ordinate with the value 
of 7 obtained numerically). 


PROOF 


It was pointed out many years ago’ that if to each of a set of n points is as- 
signed a mass equal to its z-coordinate, and the center of mass of these weighted 
points obtained, then this centroid* will lie upon the line of best fit. One can 
readily obtain this result from the second of the normal equations for the line 
of least squares.‘ 

It now becomes advantageous to move the Y-axis to O’Y’, one spacing unit 
s to the left of point A, and to take s as the scale-unit for the z’s. The z-coordi- 
nates of the points now become 1, 2, 3, - - - , and the problem is thus reduced 
to the graphic location of the centroid of a series of horizontally equidistant 


points weighted by increasing integers. This problem has been solved in a 
recent paper by the author,’ and the result is accomplished by exactly the pro- 
cedure of “advancing centroids by two-thirds” outlined above. 

In order to prove this statement, let us temporarily relabel the points Pi, Ps, 
P;, ++, and let designate the centroid of the weighted points P;, P2, ---, 
P,, assigning to the point P,(t, y,) the weight 7 equal to its subscript. Then 
the X-coordinate of C;, will equal 


Swe, 2 1 
Li + 1) 3 3 
Thus, each succeeding centroid lies on a vertical line two-thirds of a unit to 


1 “Rapid method for determining mean values and areas graphically,” Science, 121 (1955), pp. 212-3, reprinted 
in Agricultural Engineering, 36 (1955), p. 673. Essentially, one proceeds through the points from left to right as in 
the present paper, but advancing by 4s at each step. From this basic article on the method of advancing centroids, 
conclusions similar to the present author's were arrived at empirically by Irving H. Sher in his communication 
“Two methods for obtaining least squares lines,” Science, 123 (1956), pp. 102-4, but a complete mathematical proof 
was not given. 

2 P. Werkmeister, “Ermittlung der plausibelsten Geraden einer fehlerzeigenden Punktreihe,” Zeitschrift far 
angewandte Mathematik und Mechanik, 1 (1921), pp. 491-4. 

3 The technical difference between center of gravity and center of mass is not important here, and the term 
“centroid” will be used for either without distinction. The centroid of two points lies upon the straight line joining 
them, dividing the segment into a ratio equal to the ratio of the two weights, and situated nearer to the “heavier” 
point. In determining the centroid of a larger set: of points, any two or more points may be replaced by their sub- 
centroid, to which must be assigned a mass equal ‘io the sum of the masses of the component points. 

Applying the weights =z; to the n points P;(z;, yi) results in the centroid Q with coordinates zg = 
= and yg= Dw; = Upon dividing through each term of the normal equation, 
Lay =m Tay, by lies upon the line y =mz+0. 

“Meen rates of change and least sq 
Physiolegy, 8 (1955), pp. 347-52. 
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Fie, 3. Check, by spacing of A, U, T, F, and by alternative determination of M. 
Proof (see text). 


Yq 


| 


Fia. 4. Determining the mean deviation, VW, by locating the centroids of the 
upper and lower endpoints of the deviations A,A, BB,, CC:,- ++, FF:. 


the right of the previous one. Also, the centroid C, must lie upon the straight 
line passing through the preceding centroid Cy_, and the next point P; (con- 
sidering that, in determining the centroid of Pi, Ps, --- ,Ps4, Px, the first 
k—1 of the P; may be replaced by their subcentroid C,_;). Since C; coincides 
with P,, it is easily seen by mathematical induction that the graphic method 
truly provides the entire series of centroids C;,. On the diagram, B’ is the 
weighted centroid of A and B with masses of 1 unit at A and 2 units at B; C’ 
represents the centroid of A, B, and C, weighted respectively 1, 2, and 3; 
and so on. 

Therefore, the point T is the weighted centroid of the complete set of points, 
and consequently lies upon the line of best fit. It should be clear that relocation 
of the Y-axis or altering the size of the coordinate scales will not change the 
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minimizing property of the unique line of best fit associated with the given 
points. 

By symmetry, the same argument applies for the right-to-left process, when 
one places the Y-axis at O’’Y’’, one spacing unit to the right of the last point 
in the series, and takes the positive direction for the X-axis toward the left. 
The point M represents the centroid of the entire series of points equally 
weighted, which will also lie upon the line of best fit.* 


MEAN DEVIATION 


The advantage of the graphic method described above would be consider- 
ably diminished if it were necessary to carry out the usual computations in 
order to estimate the residual variability. However, it is possible to obtain 
quite readily the mean deviation (i.e., the mean of the absolute values of the 
vertical deviations of the original points from the line of best fit) by a simple 
graphic process. 

Consider the segments representing the deviations (Fig. 4). Obtain the cen- 
troid of the upper endpoints of these segments, and of the lower ends. The points 
are taken equally weighted, and one may proceed either from left to right or 
vice versa.’ The segment joining the two centroids (VW on the diagram) will 
then equal the mean deviation. As a check, this segment should be exactly bi- 
sected by the line of best fit. 


COMMENTS 


In plotting the points, it is advantageous to have the scale on the graph 
paper so selected in advance that the intervals between the original points are 
already divided into thirds. But if this has not been done, one can generally 
estimate the division points fairly well without much loss in accuracy. 

It will become apparent that the actual drawing in of the segments AB’, 
B'C',- E’T, and FE”, ---, is not essential. After acquiring 
a little familiarity with the procedure, one may mark down merely the points 
B’,C’,---,Tand D”,---,U.8 

If one should desire to extend the time series or to add further data to the 
original values, then the polygonal lines leading to M and 7 may easily be con- 
tinued toward the right to include the additional data, and the revised trend 
line drawn through the new points M’ and T’. Similarly, one may easily delete 
values at the end of the series by working backward from M and T’. In general, 
however, it is preferable to use U and T' as the points through which to draw 
the line of best fit, rather than M and 7’, because the greater separation of U 
and T' will result in a more accurate slope. 

Papers describing other applications of the “advancing centroids” technique, 
including methods for working out graphically an entire sequence of moving 
averages, the mean value and standard deviation of a frequency distribution, 
the resultant of a series of vectors, and approximation by polynomials of higher 
degree, are in preparation. 

* By dividing through each term of the first of the normal equations, Lyi =m Dxj+nb, by n, it is seen that 
the coordinates of the centroid M( Zaj/n, Zyi/n) satisfy the equation y =mz-+b of the line of least squares. 


1? The details are explained in the first ref of Footnote 1. 
* Ultimately, one may omit all the intermediate points also, and mark only the endpoints T and U. 
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In many practical applications, especially when a long question 
schedule is given to two (or more) series of individuals, the computation 
of a large number of chi-square tests may become burdensome. For cer- 
tain common types of data the computation can be avoided, or greatly 
reduced, by the charts presented in this paper. If so desired these charts 
may be used as a screen for borderline 22 tables and the occasional 
borderline case calculated by the usual methods, 


THE PROBLEM 


NE of the most widely employed of all statistical techniques is the chi- 

square test for 2X2 tables. In many practical situations a given body of 
data (often on punched cards) may lead to literally hundreds of tables, especial- 
ly when long question schedules are involved. It becomes a burdensome chore 
to perform the hundreds of chi-square tests that a conscientious research worker 
sometimes feels are necessary. Various shortcut procedures have been devised 
to meet this particular problem; for example, there are extensive tabulations 
of Fisker’s exact test [3, 4, 6, 10] as well as various nomograms and other 
devices. The principal difficulty with the published tables is that multiple 
entries are necessary. For example, if a 2X2 table is denoted symbolically as 
in Table I below: 


TABLE I 
NOTATION FOR A 2X2 TABLE 


Attribute A Attribute Not A 
Sample 1 a 
Sample 2 
Total T N-T 


a tabulation of the exact test requires four entries, N,, Nz, a, and c. Since N; 
and N; can be any integers, there is an enormous number of possible combina- 
tions of values for a and ¢, at least for the larger values of N, and N». The size of 
the tables, even with some restrictions, quickly tends to become encyclopedic. 

The approach to the problem proposed here, while subject to certain restric- 
tions on the kinds of 2X2 tables that are covered, allows a far more compact 
presentation. This economy is achieved by exploiting a characteristic of the chi- 
square test that seems to have been largely overlooked: The calculated prob- 
abilities are not greatly dependent on the actual numerical values of N, and N; 
but rather are principally affected by the ratios of these numbers or (what is 
more convenient to tabulate) by 


Ni 
Nit+ Nit 

18 


Q. 
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Furthermore (in the cases to be considered here) for a given probability level 
such as 5% or 1%, it will suffice to tabulate the critical values of a or ¢ with 
respect to the quantity T(7'’<N—T). Thus it becomes possible to construct a 
table with just two entries, P or Q and 7, which can cover a very substantial 
proportion of the 2X2 tables ordinarily encountered in research problems. We 
will return to this matter in more detail in the section on justification. 

In many practical applications the 22 tables encountered tend to have the 
following characteristics: 


(1) The two samples to be compared are of the same order of magnitude, 
that is, the larger sample is less than, say, nine times as great as the small- 
er sample. 

(2) The quantity T' (i.e., the total number of cases in both series with a given 
attribute) is small relative to the total number of cases in both series. 
For the usual two-tailed test at the 5% level “small” might be inter- 
preted as requiring that T' be less than 20% of Ni+Nsz. If T is larger 
than 20% of Ni+N; the procedure tends to become overconservative (as 
compared to the usual chi-square test). In any event the procedure can 
be used as a screen and some users might wish to go through the usual 
calculations for occasional borderline cases. 

(3) The value of 7 will often be less than 50. This restriction is not inherent 
in the method and it is planned to publish more complete tables at a 
later date. 


For any 2X2 table with these three characteristics, the charts presented in 
this paper will allow the research worker to determine the statistical significance 
of his data by calculating the ratios 


Ni N2 
Nit+N2 Nit+ 


In many applications where N; and N; remain the same over a series of tables, 
the research worker can make the required significance tests with what amounts 
to no calculations. Moreover, the research worker has his choice as to whether 
he wants to use one-tailed or two-tailed tests and also has a choice of four com- 
mon significance levels. The charts are very simple to use and the risk of com- 
putational error is greatly reduced. 

One word of warning: Although these charts make it possible to perform 
large numbers of significance tests with very little effort, it still remains for 
the research worker to carefully consider: 


(1) Is the chi-square test the appropriate technique to use in the problem? 

(2) What modification in the interpretation of the tests is required when 
data on the same series of of individuals are repeatedly subjected to the 
tests? 

(3) What factors (or biases) other than sampling variation or the particular 
factor under study may account for any observed significances? 
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Cuanrt I. Cuart or Critica VALUES 


One-tailed: 5% Probability Level 
Two-tailed: 10% Probability Level 


VALUES OF P OR Q 


425 .30 .35 .hO .50 .55 .60 .65 .70 .75 .80 .85 .90 


No ZERO ONE 
6] ‘caw BE ah L 3 
At 
6 
20 
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26 20 
28 22 
Z 26 
3 
vf NANA LA 


2h 6 8 0 BW 2O 2 6B 
wl 3 DS S 
CRITICAL VALUES 


USE OF THE CHARTS 


Instructions applying to the use of the charts given in this paper are rela- 
tively simple. The numbers listed horizontally at the top of the chart are the 
values for P or Q ranging from .10 to .90. The numbers listed vertically in the 
left margin are values of T=a-+e which we have tabulated from 5 to 49. In 
order to use the charts, therefore, all that is needed are the values of P and T. 
The chart is entered along the horizontal line corresponding to the value of 
T under consideration. Move along horizontally until the line for the desired 
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Cuanrt II. Cuart or Criticat VaLurs 


One-tailed: 2.5% Probability Level 
Two-tailed: 5% Probability Level 


VALUES OF P OR Q 


.25 430° 35 .60 .65 .70 .75 .80 .85 Oe 
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CRITICAL VALUES 


value of P is crossed. The intersection of the P and T lines will fall in one of 
the bands or channels that extend more or less diagonally across the chart. 
By following the channel to the right hand margin or to the vottom of the 
chart the corresponding critical number may be read. Should the P and T lines 
intersect on the boundary, and there be a question of possible significance, 
the user may wish to employ a standard method. Channels have been alter- 
nately stippled and left white to avoid mistakes during this operation. In 
the right side margin the numbers associated with the stippled channels 
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Cuart III. Cuart or Criticat VALUES 


One-tailed: 1% Probability Level 
Two-tailed: 2% Prebability Level 


VALUES OF P OR Q 
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CRITICAL VALUES 


appear in the column labeled “s” and those numbers associated with the 
white channels appear in the column labeled “w.” Similarly, in the bottom 
margin, the row labeled “s” and the row labeled “w” contain all numbers cor- 
responding to the stippled and white channels respectively. Occasionally the 
first few channels fail to reach the margins, in which case they have been clearly 
marked at the top as “zero,” “one,” or “two.” For example, if T=9 and P=.40 
and we wish to use Chart II, the critical number is zero. If 7 =22 and P=.70 
the critical number (since this is a stippled channel) can be read either in the 
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Cuart IV. Cuart or Critica VALUES 


One-tailed: 0.5% Probability Level 
Two-tailed: 1% Probability Level 


VALUES OF P OR Q 
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column of the right margin or in the row of the bottom margin marked “s.” 
The critical number is found to be 10. 

The use of the critical number will depend on whether a one-tailed test or a 
two-tailed test is to be employed. In most work it is conventional to use a two- 
tailed test. For two-tailed tests the procedure is as follows: To find the critical 
number for a, enter the table with T and P where 


P= 
Ni+ WN: 


23 

ava 
| EOLA AA AAA 
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Repeat the process to find the critical number for c, but this time enter the 
table with 7’ and Q where 


Nit+ 


If either a or ¢ is equal to or smaller than the corresponding critical number, 
then the data are significant at the level indicated on the chart for “two-tailed” 
test. 

For the one-tailed test the appropriate procedure is slightly more compli- 
cated since only one of the two quantities, a or c, will be tested, and it is im- 
portant to select the right one. The choice depends on the hypothesis under 
test. If the hypothesis is that the risk of falling in the a cell is increased the test 
should be made on c. If the hypothesis is that the risk of falling in the a cell is 
decreased then the test should be made on a. For the one-tailed test the signifi- 
cance level is the one indicated on the chart. 


Q= 


ILLUSTRATIVE EXAMPLES 


Application of the preceding method and use of the charts will perhaps be 
made clearer by some examples. Recently a study was made to examine the 
influence that environmental factors may have on larynx and lung cancer. One 
of the factors under study was occupation, because it was suspected that cer- 
tain occupations might have a greater risk of cancer. Although the study con- 
tained a considerable series of patients (209 larynx cancer cases, 132 epidermoid 
lung cancer cases, and 209 controls), the largest number appearing in any one 
occupation category was only 13. The following tables represent a portion of 
the basic data. 


TABLE II 
OCCUPATION OF PATIENTS IN THREE SERIES 

Larynx Lung Controls 
Lumber, carpenter 13 4 1l 
Paper, furniture maker ft 4 8 
Metal grinders 4 4 4 
Steamfitters, electrician 12 7 10 
Welder, riveter 3 1 2 
Hot Metal Worker 7 4 2 
Cold Metal Worker 10 7 9 
Painter 8 7 9 
Printer 2 1 1 
Oil Industry 3 0 1 
Mechanics 6 3 5 
Total 72 42 62 
Total Number of Cases 209 132 209 


Because of the matching this sort of data ought, strictly speaking, to be 
analyzed by the method of paired comparisons. In the method of paired com- 
parisons, pairs where both individuals were engaged in the suspect occupation 
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would be eliminated. Aside from this the method is the same as the one given 
here. Since few, if any, cases would occur where both members of the pair were 
engaged in the suspect occupation, the procedure given here is virtually equiva- 
lent to the paired comparison procedure. 

Let us first consider the case where N, = Ne, which occurs if we compare the 
larynx cancer series to the matched control series. Suppose that we wish to 
make two-tailed tests at the 5% level. For the occupation category “Hot Metal 
Worker” we might construct the following 2X2 table: 


TABLE III 
A 2X2 TABLE FOR A GIVEN OCCUPATION 


Hot Metal Not Hot Metal Total 

Worker Worker 
Larynx a=7 b = 202 Ni =209 
Controls c=2 d=207 N;=209 
Total .T=9 N-T=409 N =418 


Since N, = N,=209, then P = Q=.50. We may use Chart II for a 5% probability 
level to obtain the critical values. Here, of course, since P=Q, we would have 
the same critical value for both a and ¢. Enter the chart at 7 =9 and move hori- 
zontally to P = .50. Here the critical number is 1. Neither a nor ¢ is as small as 
1 so the verdict is “not significant.” 

It will be noted that it is not necessary to construct the 2X2 table (which 
was included here only for purposes of exposition). Since P is the same, the 
only arithmetic operation required to test a given occupation is to add a to ¢ 
for all the comparisons between the larynx and control series. It is, therefore, 
possible to perform a long series of significance tests virtually by inspection. 

Now let us consider a case in which N, is not equal to Nz, i.e. comparison of 
the epidermoid lung and the control series. (Note: Although the control series 
had been matched to the larynx series it turned out that the age distribution, 
religion, education, etc., also were similar to that of the epidermoid lung series so 
that this series could be used for comparisons with lung cancer as well.) Once 
again we will consider a two-tailed test but this time we shall use the 1% level 
(Chart IV). Since here N, = 132 (epidermoid lung) and N,=209 (controls) we 
will use 


P = -— = 39 
341 


to obtain the critical value of a and 


209 
61 
341 


to obtain the critical value of c. Again considering the “Hot Metal Workers” 
we see that 7 =4+2=6, so that the intersection of 7 =6 and P=.39 lies in the 
region where no significant difference can be demonstrated at this level. In 
other words, even if a were zero this would not suffice to demonstrate a differ- 
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ence at the 1% level. The critical value for c turns out to be zero, but in the 
dats. c=2, consequently the verdict is “not significant.” 

As a third example we could consider a one-tailed test at the 5% level. One 
motivation for such a test might be the fact that previous studies of occupation 
and lung cancer had suggested that certain occupations, including Hot Metal 
Workers, might have an increased risk. To test this hypothesis we would use 
c because under the hypothesis we would expect c to be “too small.” The critical 
number for c could be obtained from Chart I (T=6, Q=.61). The number 
turns out to be 1 so that the verdict again is “not significant.” If there had 
been five “Hot Metal Workers” in the lung series and one in the control series 
we would have obtained significance for the one-tailed test at the 5% level. 

We wish to reiterate what we stressed earlier: When the chi-square test is 
used in this fashion for larger series of data, the interpretation of the findings 
requires some care and caution. If, for example, we used the 5% level and made 
all possible intercomparisons in Table II, we would have made 33 significance 
tests (i.e., including larynx vs. epidermoid lung). In the complete absence of 
any real effects we might expect to get at least one “significant” value. On the 
other hand, the power of the chi-square is known to be poor when the numbers 
are small so that it is quite possible for fairly substantial real effects to be present 
which will not be detected by this procedure. The user is also cautioned against 
switching significance levels or changing from two-tailed to one-tailed tests 
merely for the purpose of achieving some desired demonstration. 


JUSTIFICATION 


We will consider the derivation for a 2X2 table where the probability of 
Attribute A is small in both samples. Furthermore we will assume that the two 
samples (of size N; and N; respectively) are independent. Under these condi- 
tions a and ¢ will approximately follow independent Poisson distributions so 
we may write: 


P(a, c| 62) = P(a)P(c) = x (1.01) 


where: 


6,=the probability of Attribute A in sample 1, 
6,=the probability of Attribute A in sample 2. 
Under the null hypothesis 0, = 6,=6, so that equation (1.01) reduces to: 


alc! 


P(a,c = 


which can be rewritten as: 
P(a, c| 0) = P(a| T)P(T | 0) 
T! Nz 14ND (ON, + ON2)? 


In other words the probability of the sample can now be expressed as the 
product of two probabilities where the conditional probability is no longer de- 


(1.02) 
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pendent on @ and can be used as the basis for a significance test. What is more, 
this is a simple binomial distribution: 


T! 
P(a| T) = (P)*(Q)° (1.03) 


P= and Q= ‘ 
Nit+ Nz 

Equation (1.03) provides the basis for the charts presented earlier. It will be 
noted that (1.03) depends on two quantities, 7’ and P. For given values of T 
and for a specified probability level (such as 2.5%) the critical value of a (or c) 
depends only on P. Hence, using a Binomial table [11] we readily obtain the 
critical numbers presented in the preceding charts. 

So far we have assumed that @ is some small probability and an important 
practical question is: How small is “small”? 

In order to examine this question a comparison has been made with the woead 
chi-square test (with Yates’ correction). The results are shown in Table IV. 
Two relative sample sizes have been considered, the first where N2=N, and 
the second where N,=4N,. To answer the question of how small @ must be 
we have listed two proportions, 20% and 50%; the latter strains our method to 
the extreme. The numbers listed in the body of the table are the differences 
found when subtracting the critical value given in the chart from the critical 
value obtained from a chi-square test. The places where a dash appears means 
that for both the chart values and for chi-square no significance can be demon- 
strated for any sample value. The z’s appearing in the table mean that the 
critical value obtained by chi-square was zero whereas no value would yield 
significance if the charts were used. 


TABLE IV 
DIFFERENCES IN CRITICAL NUMBERS (5% LEVEL) 
(Yates-corrected chi-square minus chart values) 
Relative Sizes 
of Samples Ns—4M, 
Over-all Proportion 


T 
in Category 20% 


Values of T 


10 
15 


x 
|] 
OF OF OC 
| | 


where: 
a 
5 
20 
25 
30 
35 
40 
45 
50 


28 AMERICAN STATISTICAL ASSOCIATION JOURNAL, MARCH 1957 


For the special case where N,=WN; a direct comparison can also be made 
with Fisher’s exact test which has been extensively tabulated by Swaroop and 
others [10, 6]. Since both the exact test and the chi-square test gave identical 
results we did not bother tabulating separately the differences for the exact 
test. 

It will be seen from Table IV that if 


the charts are overconservative in 10 of 30 cases and that the largest difference 
between the critical numbers is 1. On the other hand if 


the charts are always overconservative and the difference in critical numbers 
may be 2. On the basis of Table IV and other explorations the value 20% would 
seem to serve as a rough guide for two-tailed tests at the 5% level. 
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A NOTE ON THE EFFECTS OF NONRESPONSE ON SURVEYS 


K. A. BRowNLEB 
University of Chicago 


ERKSON [1] has recently illustrated by a hypothetical numerical example 

what might happen in a survey of the association between cigarette smok- 
ing and cancer of the lung if the rates of recruitment of the population to the 
survey vary according to the situation of the individual. It seems worthwhile 
to present his model in more generality and to explore some of its properties. 

Berkson proposes that the over-all population of size N accessible to the 
survey team (which, of course, may not be representative of the corresponding 
over-all group in the U. 8.) is, unbeknownst to us, or at least unrecorded by 
the survey team, actually divided into two categories, those who are close to 
expiring ard those who are not. Call these groups “unhealthy” and “healthy,” 
and let the proportions of the population belonging to these groups be C and 
(1—C) with death rates D, and D2. 

It is clear that even if correct in principle, in detail this model must be an 
oversimplification, since the concept of only two states of health with their 
associated probabilities C and (1—C) and death rates D, and D, is too crude. 
A model with some parameter of health, say h, continuous over some interval, 
with a probability distribution, and corresponding to each value of h a death 
rate D(h), presumably a monotonic function of h, would be more realistic. To 
consider such a model we would need to specify the distribution of h and the 
form of the function D(h). It seems probable that the main features of such a 
more complex model will emerge from our simple model. 

In this model, we assume that the probability of an individual being a smoker 
is S irrespective of whether he belongs to the unhealthy or healthy group. 

We now suppose that the recruitment rates of the population to the survey 
will be represented by the following symbols: 


(a): unhealthy, smokers: 
(b): unhealthy, nonsmokers: 
(c): healthy, smokers: 

(d): healthy, nonsmokers: 


Berkson’s numerical model is a little less general than this, in that he has 
Ry = Rp: the motivation for this is that he hypothesizes that the unhealthy 
group, people about to expire, will have a relatively low rate of recruitment to 
the survey, and in this case the probability of recruitment is independent of 
smoking (i.e. Ru=Ri=0.5 in his example), whereas with healthy people the 
nonsmokers are more willing to enter the survey, or are more eagerly recruited, 
than the smokers (in his example Roo =0.99, Ro: = 0.65). 

An important feature of this model is that the probability of death of an indi- 
vidual is independent of whether he is a smoker or nonsmoker. For example, 
for smokers the total number of deaths in the unhealthy group is NSCD, and 


29 


Ru 
Rw 
Ra 
Roo 
|_| 
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in the healthy group NS(1—C)Ds, so the total number of deaths is NS[CD, 
+(1—C)D,! in a group of size NS, so the death rate is [CD,+(1—C)D.]. For 
nonsmokers the calculation proceeds analogously, with (1—S) in place of 8S, 
and leads to the same result. 

Of course, we do not observe the death rates in the population, but only in 
that portion of the population which enters the survey. For smokers, the 
numbers of the two types of individuals entering the survey are NCSRu and 
N(1—C)SRa, subject to death rates D; and D,: the observed death rate for 
the smoking group is thus 


CD,Ru + (1 — C)DRa 
+ (1 — C)Ra 
Similarly, the observed death rate for the nonsmoking group is 


+ (1 — C)D2Roo 
+ (1 — C)Roo 


The ratio of the observed smoker death rate to the observed nonsmoker death 
rate, say F’, is (1)/(2), which reduces to 


1 Ds | (- 1 
-1 1 —-1 
(; D; Ru + Ryo 
1 Dz Roo 
1 —-1 
Cc R R 
1 de 10. 
Cc dD, Rio Ru 
In (3), the cases of C=9 or 1 are to be excluded, since they imply that the 
population is not made up of two groups but of only one group; similarly, the 
case D,=D, is to be excluded, since this also implies that there is only one 
group, since the two groups would be indistinguishable. 


Thus for the ratio of observed death rates, smokers to nonsmokers, to be 
equal to one, we must have 


(1) 


(2) 


(3) 


(4) 


which implies that for smokers the ratio of recruitment rates for unhealthy 
and healthy people is the same as for nonsmokers, or, the same thing in another 
way, if for healthy persons the ratio of recruitment rates for smokers and non- 
smokers is the same as for unhealthy persons. 

If no restriction is placed upon the R’s, then it is possible for F to approach 
zero or infinity. 


i 
Ru Roo 
Rus Rio = 
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The most desirable situation, of course, is that in which the probability of an 
individual being recruited to the survey is independent of his health and of his 
smoking status, i.e. the recruitment rate is the same for all sections of the popu- 
lation: 


Ro = Ra = Ri = Ru. (5) 


Under these conditions, the expected value of the over-all death rate observed 
in the sample will be that of the population, and likewise for the proportion of 
smokers. On the other hand, if this over-all death rate and this proportion of 
smokers as observed in the sample do not agree with the corresponding popu- 
lation values (assuming these to be known from other sources) then this is a 
clear indication that condition (5) does not hold. The observed result for F can 
then only be relied upon if we are confident that (4) is satisfied, but clearly it is 
not easy to produce evidence to bear upon this point. 
In the population, the ratio of smokers to nonsmokers is S/(1—S). In the 
sample, this ratio is 
SCRy + — C)Ra ( S CRu + (1 — C)Ra 
(1 — + (1 — — C)Roo 1 — S/ CRw + (1 — C)Roo 


Define the distortion of the ratio of smokers to nonsmokers as f: 


at CRu + (1 — C)Ra : 
CR + (1 — C)Roo 
Going back to (1) and (2), we can write 
CD,Ru + (1 1 ‘ 
+ (1 — C)DiRo f 


We may now adopt the restriction in Berkson’s model, that Ru=Rw=R, 
say, and consider 1/F: 


( 1 Roo 
C R 


DR 


(6) 


PF 


(8) 


The condition for maximizing F, or minimizing 1/F, is to put Ro =0 (assuming 
Dz<.D,), in which case 


1 D; 
7 —f). (10) 


This implies that 1/F must always be greater than f, or F must always be less 
than 1/f.* In other words, if the observed ratio of death rates is greater than 


* I am indebted to J. L. Hodges, Jr., for pointing this out to me. 


(7) 
Ra 
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the ratio of the nonsmoker to smoker ratios for sample and population (assum- 
ing this to be known for the population from other sources), then we cannot 
account for the whole of the difference between death rates by differential 
sampling and an alternative hypothesis must be sought: for example, we might 
suppose that the hypothesis that the death rates for smokers and nonsmokers 
are the same may need to be abandoned. However, this does not seem to be a 
very satisfactory basis for making statistical inferences: it amounts to saying 
“Granted my sampling scheme is poor, it is not that poor.” While we may 
make decisions in everyday life on this basis, it seems hardly good enough for 
the formulation of scientific conclusions. 


REFERENCE 
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A MODIFICATON OF KENDALL’S TAU FOR THE CASE OF 
ARBITRARY TIES IN BOTH RANKINGS 


Leta McKinney ADLER 
Fayetteville, Arkansas 


N certain correlation problems where the number and frequency of tied 
] rankings are regarded as arbitrary, that is, the result of the ranking proce- 
dure rather than of the qualities of the ranked individuals (e.g., Guttman scale 
scores [2]), the criteria for perfect correlation of standard coefficients of correla- 
tion may be inappropriate. Kendall’s tau, Spearman’s rho, and the ordinary 
product-moment coefficient of correlation do not yield a perfect correlation 
when pairs of scores can be arranged so that both rankings are in natural order 
unless all individuals (and only those individuals) who are tied in each set of 
ties on one ranking are likewise tied in corresponding sets of ties on the other 
ranking. 

In cases for which ties are considered arbitrary, the following criteria for per- 
fect correlation are suggested: pairs of ties can be arranged so that all ties in 

‘each ranking are grouped together and sets of tied scores are in natural order, 
individuals tied in one ranking not necessarily being tied in the other ranking. 
Stated otherwise, individuals arranged in natural order with regard to one rank- 
ing can be arranged in natural order with regard to the other by shifting only 
individuals tied with regard to the first ranking. It is the purpose of this paper 
to present a modification of Kendall’s tau which produces a correlation co- 
efficient of unity under the above conditions. 

Kendall’s tau for the uniied case is given by 


in(n — 1) 


where n is the number of individuals ranked. If individuals are arranged so 
that their scores on ranking A are in natural order, then P is the number of 
comparisons in which the scores of each two individuals compared are also in 
natural order on ranking B; Q is the number of comparisons in which scores are 
in inverse order for ranking B; and S=P—Q[3, pp. 4-5]. In case of a tie in 
one or both rankings, the comparison is assigned a value 0, and Kendall’s for- 
mula for ties in both rankings becomes, 


(1) 


S 
T = 
= 1) = 1) = 
when agreement between the two rankings is the object of measurement 
[3, pp. 34-6]. In this expression, T=} > t(t—1), where ¢ is the number of tied 


positions in one set of ties and the summation runs over all such sets in one 
ranking. U is the corresponding value for the other ranking. 
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(2) 
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The denominator of (2) reduces to 4n(n—1) in the case where 7'= U=0. 
However, if U =0, while T’>0, the denominator becomes 


V [4n(n — 1) — — 1)) 
The denominator for the case of ties in one ranking is given by Sillitto as 


The latter form actually reduces the total number of comparisons by the num- 
ber of tied comparisons [4, p. 36]. 

The number of tied comparisons can be considered to come from three 
sources: the comparisons of individuals who are tied on both rankings, 7’; 
comparisons of individuals tied on ranking A only, 72; and comparisons of 
individuals tied on ranking B only, 7;. In Kendall’s formula, 7’ =7,+T72 and 
U=T,+T;. Under the condition that Q=0, rz can be written, 


— 1) — (7, + + Ts) 
V [4n(n — 1) — (7: + T2)][3n(n — 1) — + 


Thus, given the condition of perfect sequence, 7s equals unity only when T,=T7; 
=0, or when both rankings are identically tied. 

The number of tied comparisons derived from the three sources is exactly 
the number by which the possibility of positive or negative comparisons is 
reduced. Hence the following expression for tau in the tied case is suggested, 


— 1) — T" (4) 


(3) 


where T’ = 7;+72+1T;. 


where ¢, is the number of individuals in a given set who are tied on both rankings 
and the summation runs over all sets of such ties. 


T: = — 1), 


where t is the number of individuals in a given set who are tied on Ranking A 
but not on Ranking B, and the summation runs over all such sets. 7; is the 
corresponding quantity for Ranking B. 

This expression for tau is consistent with previously obtained results for the 
cases of no ties or ties in one ranking only. If T’=0, (4) reduces to (1), and if 
T,+T:=T=0 or T;+ U=0, (4) reduces to (3). 

It can be shown that 7’ is ee greater than or equal to 7, in absolute value 
by observing that 


N-T'+%=N-T 
N-1’'+T,=N-U, 
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where N stands for $n(n—1). Placing these values in the denominators of 
Tp and r’, respectively, we obtain the inequality which must hold if r’2r3, 


2>N-T 


(N — 1")? + (N + Ts) + = (N 


Since 7;, Tz, T;20 and N2T’", this inequality is valid and becomes an identity — 
only when 7;= 7;=0, that is when both rankings are identically tied. 
Since P+Q+T’ =N, equation (4) can be written, 
5 


It is apparent from this that if Q=0, r’=1; if P=0, r’= —1; and if P—Q=0, 
7’=0. This form should be employed in computation since it eliminates the 
need to count the number of tied comparisons. 

Goodman and Kruskal [1], in considering measures of association for con- 
tingency tables, arrived at an expression the same as (5) except that it substi- 
tuted parameters for the random variables used here. Wallis and Roberts’ 
recent book [5, p. 282] introduced this measure in the random variable form 
for the uses suggested by Goodman and Kruskal. 

As far as known by this author no adequate sampling distribution is avail- 
able for this statistic. Since the sampling distribution used in connection with 
tau is that of S, the suggested change of denominator does not affect sampling 
problems of tau. However, the sampling distribution of S with large numbers of 
ties presents problems in any case. 
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TWO CONFIDENCE INTERVALS FOR THE RATIO OF TWO PROB- 
ABILITIES AND SOME MEASURES OF EFFECTIVENESS* 


Gortrriep E, 
Boston University 


1. INTRODUCTION 


T 1s often necessary to compare the success rate of an experimental method 
with that of a standard method. If the two success rates are denoted by pz 
and pu, respectively, the problem is usually solved by finding a confidence inter- 
val for the difference A=p2.—p:. However, A has the disadvantage of being 
subject to an upper bound which depends on the unknown 7. It would then 
seem that often a more appropriate quantity is given by 
Pie 
if p: is suspected to be greater than p:, and by 
Pi 


if p2 is suspected to be smaller than p;. P and P’, which have been called “effec- 
tiveness indices” in [6, p. 284], measure the actual difference between p; and 
P2 in terms of the maximum possible difference, and therefore are not subject 


to an unknown upper bound. 

We want to find confidence intervals for P and P’. However, before doing 
so it may be helpful to give an example of the use of each quantity. Thus, let 
7 denote the proportion of persons of average preparation who obtain passing 
grades on a given test, while p: denotes the corresponding proportion of persons 
who have had specialized training before taking the test. P then measures what 
proportion of those who would have failed the test without specialized training 
actually pass it as a consequence of such training. We may say that P measures 
the effectiveness of specialized training. Or again, let p,; be the rate of attack of 
a certain disease among unvaccinated persons, while p, is the rate of attack 
among persons who have been vaccinated against the disease. 100P’ then 
measures the percentage reduction in the incidence of the disease as a conse- 
quence of vaccination. Again we can say that P’ measures the effectiveness of 
vaccination. The reader who is interested in questions of interpretation arising 
from the use of the quantities P and P’ (as well as A and —P’) in connection 
with similar problems is referred to [6] and [4]. 

The two problems of finding confidence intervals for P and P’ can be solved 
by identical methods. Thus we note that 


(1) 


* Research sponsored by the Office of Naval Research under Contract Nonr 1636(00). Reproduction in whole 
or in part is permitted for any purpose of the United States Government. 

The author is obliged tc Ralph Fasano and William Granet as well as to the Statistical and Research Services 
of Boston University for computational help. 
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and 


It follows that if we set 


where @, and 6: are the parameters of two binomial distributions, 0;, i=1, 2, 
being equal to qg; in the case of P and to p; in the case of P’, confidence intervals 
for P and P’ can be obtained directly from those for u. Indeed, let u<u<f be 
such a confidence interval for 4. Then confidence intervals for P and P’, re- 
spectively, are given by 


P 
<lrs (3) 
depending on whether we set 6;=q; or p;. In addition, the length of the con- 
fidence interva! (3) for P (or P’) is equal to the length 7—u of the confidence 
interval for u. 
During the remainder of the paper we shall then be concerned with confidence 
intervals for the parameter y» defined above. We note that, as far as confidence 


intervals for P and P’ are concerned, we could restrict ourselves to the case 
62<0;. However, we shall not necessarily do so. Actually, the parameter —P’ 
=(p2—p:1)/pi=P’’, say, has been used in the statistical literature for both 
p2> pi and Obviously, 


is a confidence interval for P’’, if we set 0;= pi. 
The problem of finding a confidence interval for P’’ when both p; and pz are 
small has been considered by Bross [1]. 


2. CONFIDENCE INTERVALS FOR yu 


2.1. General Remarks and Notation. Throughout the paper, k;, <=1, 2, will 
be used to denote the observed number of successes in n; independent trials 
with probability 6; of success. We shall write y,; for the sample estimate k,/n; of 
6; and u for the sample estimate y2/y; of u. It will always be assumed that n; 
is sufficiently large for y; to be approximately normally distributed. From a 
practical point of view, this is not too strong a restriction, since, as we shall 
see later, the average length of the confidence intervals for » turns out to be 
practically useless unless m and nm, are fairly large, say, at least 50. Also,”for 
both 6; and 6, small when n; and nz would have to be extremely large for the 
normal approximation to be useful, the method of Bross [1] based on the Poisson 
distribution is applicable. 

The two-sided normal deviate at a given significance level will be denoted by 
h. We shall use V; to denote the relative variance of y,, 


(2) 

_ & 
reer 
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vary; — _ 1-4 


V. => = 
(Ey;)* nO; 


and »,; for its sample estimate, 
k; 


2.2. Special Case. We shall first consider briefly the special case when @, is 
assumed to be known. In this case, confidence limits for u are obtained by divid- 
ing the confidence limits for 6 by #,. A confidence interval for @ is given by 


Na y(l — ys) » 
— +04/ (4) 


= 


as is shown, e.g., in [2, p. 515]. Dividing by @, and rewriting the expression so 
as to make it more convenient for future comparisons, the confidence interval 


for becomes 
( ? ? 
e 5 
= 1+ AA/ (5) 


If ne is so large that terms of order 1/nz may be neglected, (5) reduces to 
Y2 
=(1 + 


which corresponds to the central confidence interval 


(7) 


for the parameter 4,. 

2.3. General Case. In the general case, a great many different confidence 
intervals are possible depending on the method of approximation. Some of 
these methods will be considered in Section 3.1. Here we want to discuss only 
the following two intervals which seem to possess certain desirable properties: 


? 1 4n, 


u(1 + AV + 02). 
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It is seen at once that if we let n,— ©, holding nz fixed, (I,) and (I,) reduce 
to (5) and (6) respectively, provided u is now interpreted as y2/0,. Thus (5) 
and (6) are simply special cases of (I,) and (I:). Also, as in the case of (5) and 
(6), (11) reduces to (I,) if n;, i=1, 2, is so large that terms of order 1/n; may be 
neglected. 

Before discussing the problem of selecting one of the two intervals, we note 
that the two formulas become inapplicable if one of the k; is zero. However, the 
probability of such an event is extremely small if the assumptions made earlier 
are satisfied, namely, that both n; and m are large and that neither 6, nor 4, is 
very small. 

The problem of making a choice between (I,) and (I:) is similar to that of 
making a choice between (4) and (7). From a computational point of view, 
(Iz) is certainly considerably simpler than (I,). However, in choosing a particu- 
lar interval, it is often more important to consider the length of and the actual 
amount of protection provided by the respective intervals. 

In Section 3.2, it will be shown that in general (I,) provides a shorter interval 
than (I,). Only for quite small or quite large values of yz can (Ix) produce a 
shorter interval than (I,). But the difference in length will rarely have practical 
significance, as can be seen from Table I which compares the length of (I;) to 
that of (I,) for n=n,=n.=50 and 100, \=2, and various combinations of y; 
and 


TABLE I 


LENGTH OF (I;) COMPARED TO LENGTH OF (1;) 
(A=2) 


yn | .10 .90 .16 .40 .80 .20 .80 .40 .80 


50 -993 .962 .993 .990 .966 .963 1.005 .964 1.015 .981 
100 -997 .981 .997 .995 .983 .981 1.004 .981 1.009 .991 


Nominally, the protection provided by either confidence interval is given by 
the confidence coefficient selected by the investigator. However, both (I) 
and (I,) are only approximate intervals in the sense that the actual protection 
provided by the intervals may be somewhat higher or lower than indicated by 


the confidence coefficient. In order to have a basis for comparisons, the author ~ 


has computed exact protection levels for certain selected combinations of m, ne, 
X, and yw. Actually, it turns out that the protection levels depend also on the 
nuisance parameters 6; and 6:. The results are illustrated by Table II giving 
actual protection levels of (I,) and ‘or =m,.=50 and 100, A\=2 cor- 
responding to a nominal confidence coefficient .954, and 4=1/2. 

Inspection of Table II reveals some interesting facts. The most important 
result seems to be that the protection provided by (I,) is greater than claimed 
while the reverse is true for (I,). The difference is particularly pronounced for 
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TABLE II 
TRUE PROTECTION LEVELS OF (I) AND (Is) 
(Nominal confidence coefficient .954) 


63 -10 15 -40 
-20 -30 -80 


(Ih) 970 
(Ix) 931 
(Is) -943 -952 -951 


-959 
-951 


small values of 6: (and 6). As n increases from 50 to 100, the protection levels 
of both intervals get closer to the nominal confidence coefficient as would be 
expected. 

Very similar results have been obtained for \=2, »=1/4 and 3/4, and only 
minor deviations from the general picture have been found in the case of u.=1 
for extreme values of 6,=6,. From partial computations with \=2.58 corre- 
sponding to a nominal confidence coefficient of .99, it seems that similar results 
can be expected. 

Considering all three criteria of selection, simplicity of computation, length 
of intervals, and protection provided by the intervals, it would seem that (I,) 
has some definite advantages over (I;). As m; and nz become really large, the 
difference between (I,) and (I;) becomes negligible, and (I,) is preferable in 
view of its simplicity. 

2.4. Length of Intervals. We have already seen that (I) and (I;) do not differ 
very much in their lengths. It will, therefore, be sufficient to investigate the 
length L of interval (I:) in greater detail. We easily find L =2duv/v,+1, & 
quantity which depends on the observed values of y; and yz. We can get more 
definite information by replacing the sample quantities by population parame- 
ters. In addition, we shall assume that n,=n,=n. Thus let 


| 


/——. 
g 6s 


For n sufficiently large, Z may be considered equal to the average length of 
(I:). 

It is easily seen that, for fixed yu, Z increases as 0, (and 62) deereases. Putting 
it differently, the smaller 6, (and 6,), the less accurate the determination of u. 
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It is of some interest to compute the actual value of 6 for various combina- 
tions of » and 6. This has been done in Table III for ».=1/4, 1/2, 3/4, 1 and 
6,=9/10, 3/4, 1/2, 1/4. The average length of the corresponding confidence 
interval is of course obtained by multiplying the tabulated value by 2\/+/n. 


TABLE III 
FACTORS FOR COMPUTING AVERAGE LENGTH OF (I) 


1/4 1/2 3/4 


V9/8=1.061 +/5/2=1.581 +/33/8 =2.301 V6 = 2.449 
Vi/2= .707) 1.000 1.225 = 1.414 
V7/24= .540 +/1/2= .707 V/5/8= .791 4/2/3= .816 
V2/9= .471 .577) Vi/3= .577 /2/9=. 471 


As an example, consider confidence intervals having confidence coefficient 
.95 (A=2) based on 64 observations from each population. The average length 
of these intervals is obtained by multiplying the tabulated values by 1/2. 
If this is done, it becomes clear that the information concerning » which can 
be gained on the basis of fewer than, say, 50 or 60 observations from each 
population is often valueless. We might just as well state that 0<u<1, a state- 
ment which can be made without taking any observations if we have reason to 
believe that 6.<4. 

2.5. Examples. Let us return to the two examples mentioned in the Intro- 
duction and compute actual confidence intervals on the basis of numerical 
information. Thus, assume that out of a group of 60 persons with standard 
preparation, 25 pass the test; while out of a group of 50 persons who have had 
specialized instruction, 30 pass the test. We want to find a confidence interval 
for P. According to (1), 6;=9,. Instead of working with the number of persons 
who pass the test, we have to work with the number of persons who fail the 
test. Thus, n,=60, k, =60—25=35, n2=50, k:=50—30=20. Since both m 
and n, are relatively small, it seems appropriate to use (I,). We easily find y; 
=35/60, y2=20/50, u=y2/y: = 24/35. Remembering that v;=(1—y,)/k;, we 
find further v, = 1/84, »,=3/100. Substituting these values into (I,) and using 
\=2 to give us approximately a .95-confidence interval, we obtain u=.43, 7 
=.97. Finally, by (3), the confidence interval for P is given by 


1— 97<P<1— .47 


.03 < P < .53. 


Thus we can say that specialized training is between 3 and 53 per cent effec- 
tive in preventing failure on the test. 

As the second example, assume that in a control group of 1000 persons we 
observe 290 cases of a particular disease, while in another group of 1000 persons 


1/4 

1/2 
3/4 
9/10 

or 


42 AMERICAN STATISTICAL ASSOCIATION JOURNAL, MARCH 1957 


who have been vaccinated against this disease only 95 cases are observed. We 
want to find a confidence interval for P’. According to (2), 0;=p;, and, there- 
fore, =nz = 1000, k; = 290, k2=95. Since both n; and are large, we shall use 
(I,). We easily find u=95/290, v, =710/290,000, v.=905/95,000. Using again 
\=2 to give us a .95-confidence interval, we find »=.256, 7=.399. Again by 
(3), 


1 — .399 < P’ <1 — .256 


.601 < P’ < .744. 
Thus we may say that vaccination is between 60.1 and 74.4 per cent effective, 


3. THEORETICAL BASIS 


3.1. Derivation of Intervals. In trying to find a confidence interval for the 
parameter », we may start from two statistics, 


which is the maximum likelihood estimate of yu, and 
t = ¥2 — wy. 


Since we want to derive asymptotic confidence intervals for the parameter 
u, we have to consider the asymptotic distributions of u and ¢. Let z stand for 
a chance variable having a normal distribution with mean 0 and variance 1. 
It then follows immediately that asymptotically 


t Y2 — 


— 4) 


i=1,2. 


= o? 


Applying the so-called delta-method to the distribution of u, we find (see, e.g., 
[5, vol. 2, p. 107]) that asymptotically 


fp 

22 

uVVit Ve 
Finally, we can make use of a theorem due to Geary [3] which states that 
asymptotically 

Vou? +o + 
Inspection of the three z-statistics reveals that the denominator of each 

depends on the parameters @, and 6. The standard procedure for finding confi- 


dence intervals under such conditions is to replace the unknown parameters by 
their maximum likelihood estimates. In the case of 2, this gives 


| 
or 
| 
¢@ 
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Y2 — 


= 42, say. 


Similarly, both z, and z; reduce to the same expression z. Setting z= +A and 
solving for u, we find the extremely simple confidence interval 


u(1 + + 02). (Is) 


The confidence interval (I;) has been obtained by replacing @; by y; (including 
u=6,/0, by y2/y: =u) in the denominators of the three z-statistics. Since we are 
trying to find a confidence interval for u, a more logical procedure would cer- 
tainly be first to express either 4, or 6; in terms of u and the remaining parameter 
6 and then to replace only this remaining nuisance parameter by its sample 
estimate. We have then left an expression in » alone which can be solved for yu. 
In this way we can find six more confidence intervals, two from each z-statistic. 

We shall illustrate this method by deriving (I,) from zs. Setting zs equal to 
and squaring gives 

(u — pw)? = + V2). 


Substituting y, for 4 anc replacing 6; by its sample estimate y; leads to the 
quadratic equation in u 


2(u + (1 — = 0 


whose solutions and give (Ij). 

As already mentioned, by applying sinallar methods, it is possible to find 5 
additional intervals. They have not been given in this paper since it can be 
shown that they produce intervals which are longer, some even considerably 
longer, than (I,) and in general are as complicated from a computational point 
of view. 

3.2. Comparison of Lengths of (I:) and (Iz). Let Ly and L, stand for the length 
of (I,) and (I,), respectively. We find 


ree 


Setting n.=n, m=en and neglecting terms of order 1/n?, we find after some 
simplification 


43 
| 
rt 
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where E = [4(1+2¢)y2*—Sey2+e ]y:—4y2?. For (I:) to be shorter than (Ii), we 
must have E>0 or 


— =] 
+ 


> Au! 
4(1 + 2e)y.* — + 


where the denominator of the right side must be positive. Since y,: <1, we get 
the |east restrictive limits of y:-values for which this relationship is possible 
by setting the right side equal to 1 and solving for yz. This gives the quadratic 
equation in y2 


" 


8y:? 8y2 +1=0 
whose solutions are 


1+ Vi72 


or .15 and .85, approximately. Thus a necessary, but by no means sufficient, 
condition for L2<L, is that y:<.15 or y.>.85. It follows that in general (I,) 
will give a shorter confidence interval than (I). 

3.3. Determination of Exact Protection Levels of (I,) and (I:). We shall now 
describe how the exact protection levels associated with (I,) and (I:) can be 
computed. The actual derivation will be in terms of (I;). Only the final equation 
for (I,) will be given. 

(I:) is of the form 


u(l — + 2) < wp < + AVM + 02), 
which can be rewritten as 


(8) 


For a given value of » (and A, m, 2), (8) determines those values of (y:, y2) 
which lead to the inclusion of the particular value u in the corresponding con- 
fidence interval. Thus the exact protection level associated with (I:) is given 
by a= )-P(y:, y2) where summation is over the points (y:, ys) satisfying (8). 

For simplicity, we shall assume that n,=n,=n. If we replace the inequality 
sign in (8) by an equality sign, we get after some rearranging of terms 


2 
(nk — nd? + 2d*k)h? — + nk) + ny*k* = 0 


where k=k,=ny:, h=k,=ny2, 2 quadratic equation in h whose roots are 


| 
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7 
nulkt + — nk — + + + ~ nth? 


(n + — md? 


a can now be written as 


a=> YO Be; xX (10) 


k=l Ay<h<hy 
where 


B(g;n, 6) = oc 


is the probability of observing g successes in n independent trials with proba bil- 
ity 6 of success. In this form, a is easily evaluated with the help of existing tables 
of the binomial distribution. 

For (I,), the corresponding roots are 


h + /— (2n + + n(n + + — n*r4uk? (11) 
(n + — md? 

It is at once obvious from (10) that the value of a depends on the particular 
values of 6; and 6. On the other hand, the roots h; and hz given by equations 
(9) and ( 11)"depend on u (as well as n and A) alone. Thus a set of h;- and h,-val- 
ues corresponding to some given value u can be used to compute several values 
of a corresponding to different combinations of @, and 42, as long as 62/6; =u. 
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A SYSTEMATIC METHOD OF FINDING DEFINING CONTRASTS 


Crark Jr. 
Gulf Research & Development Company 


INTRODUCTION 


HIs paper is limited to factorial designs where each variable is to be measured 

at two levels. While the technique of fractional replication has been de- 
veloped [4, 6] for variables containing three and more levels, this is less satis- 
factory. Further, by breaking down a variable into 2, 3, 4, - - - , n pseudo vari- 
ables, the main variable can be observed at 4, 8, 16, - - - , 2* levels while retain- 
ing the form of a 2-level design. This technique has been referred to by Davies 
[4], and illustrations have appeared in the literature [1, 5]. 

In a factorial experiment containing n variables, 2* runs are required. There 
will be 2"—1 main effects and interactions. Often information on high-order 
interactions is not desired. Since these are often negligible, it would be useful 
to cut down the number of runs by sacrificing this information. This can be 
done by employing fractional replication [2, 3, 7, 8]. Thus we may speak of a 
2*-» factorial experiment. For p=2 we would have a }? or & one-quarter repli- 
cated experiment. 

If only one-fourth of a given number of runs is made, only one-fourth as 
much information can be expected. In a quarter-replicated experiment, instead 
of being able to estimate all effects and interactions independently, each will 
be confused or “confounded” with three other effects or interactions. The trick 
in using fractional factorials is to design so that the effects or interactions of 
interest are confounded with those which are not of interest, and further, which 
are expected to be small. We will then be able to assume that we have main 
effect A, for example, and not any of the high-order interactions with which it 
is confounded. 

Generalizing, in a 2*-* design, including the mean there are 2* effects. The 
number of runs is only 2*-?; hence, each run must estimate 2*/2"-? = 2? effects. 
These 2? effects estimated by each run will be mutually confounded. 

It has been pointed out [4, 6] that all of the confounding is fixed by the 
selection of the set of “defining contrasts.” Each element in the set of defining 
contrasts is one of the 2* factorial effects. Of the 2? elements in the set of de- 
fining contrasts, the identity is one, p others may be freely chosen but must be 
independent, and the remaining 2?—p—1 are determined in consequence. The 
p elements which are chosen arbitrarily are called generators. They are inde- 
pendent if none may be formed as a product of any of the others. The remaining 
2”—p—1 elements are found by forming all possible products of the generators 
two at a time, three at a time, and so forth, subject to the condition that 
A= =I, 

A useful rule for designing fractional factorial experiments would be: All 
main effects plus as many as possible of the two-factor interactions must be 
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estimated, confounded only with high-order or negligible interactions. Many 
other criteria might be selected and all can be handled according to the method 
to be outlined here. A theoretical minimum of one experimental run must be 
made for each effect to be estimated. In practice it is usually not possible to 
achieve this minimum; also, one wishes to allow a number of runs for the esti- 
mation of experimental error. 

For p=1 or 2 the set of defining contrasts to satisfy the criterion being used 
can be written without trouble. For p=3 or higher this becomes harder, since 
when the generators are written down, it is not apparent what the remaining 
terms will be until they are formed by multiplication. 

To appreciate the difficulty, we may set up a 2'*~ factorial experiment. Since 
this is a 1/16 replicate, there will be 16 terms in the set of defining contrasts. 
There will be four generators, which we may select arbitrarily. Suppose we 
choose ABCEI, ABDFJ, BDEGI, DFGHJ as the generating terms. These all 
include five letters; hence, these terms will cause no two-factor interactions to 
be confounded with main effects or with each other. However, in forming the 
remaining members of the set of defining contrasts we find that the product 
of the first and third generator gives ACDG, the product of the second and 
fourth gives ABGH, and the product of all four gives BCDH. These 3 four- 
letter terms show that the two-factor interactions 


AB AG AD CD 
AC BD CH 
BC AH BG CG DH 


will be mutually confounded with one or more of their own number. The best 
2! design thus far found contains only 2 four-letter terms in the set of de- 
fining contrasts. It is believed that the reader will have some difficulty in finding 
this design by the conventional trial-and-error method. 


THE PROPOSED METHOD 


In the proposed systematic method, each of the n letters involved is con- 
sidered separately. The trial selection of p generator terms is not the starting 
point. 

The first step is to form a table, whose columns represent all possible appear- 
ances of a single letter in the p generators. A letter may appear from 1 up to 
p times in the p generators. The rows of the table represent the members of the 
set of defining contrasts (excluding the identity). The body of the table con- 
tains a check mark for each appearance of a letter. A letter may or may not 
appear in a given member of the set of defining contrasts according to which of 
the generators contain it. 

After check marks have been assigned to the p generators in all possible 
combinations to start the “possible effect assignments” columns, the remainder 
of the columns can be filled in by the ordinary rules of multiplication. Such 
tables for p=3, 4, and 5 are given in Tables I, II, and III. In the left-hand 
margin is indicated the multiplication performed to fill each row. Denoting the 
three generators of Table I by G,, G2, and Gs, the presence of a letter by A and 
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POSSIBLE EFFECT ASSIGNMENTS FOR ONE-EIGHTH REPLICATES 


p=3 


Possible Effect Assignments 


a 


Generators 


Ke 


TABLE II 
POSSIBLE EFFECT ASSIGNMENTS FOR ONE-SIXTEENTH REPLICATES 


p=4 
Possible Effect Assignments 


KKK 

2 KM 


Generators 
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TABLE III 


n 
< 
= 
Q 
oO 
oe] 
= 
Z 
= 
= 
~ 
= 
a 


15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 


8 9 10 11 12 13 4 
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the absence of a letter by the numeral 1, we would proceed, for example, to fill 
out the fifth column as follows: 
G:=1 
G=A 
G,-G,.=A-1=A 
= A-A = A*=1 
G,-G2-G; = A-1-A = A* =1. 


The remaining columns are filled in an identical fashion. The columns are 
headed “Possible Effect Assignmenis,” since a letter may be assigned according 
to any one of the columns listed, and since there is no other possible assignment 
which could be made. Referring to Table I under Column 1, we see that if a 
letter is assigned to the first generator but not to the second or third generator, 
then that letter will also appear in the fourth, fifth, and seventh members of 
the set of defining contrasts (neglecting the identity). Similarly, Column 7 
indicates that if a letter appears in all three generators it will appear again 
only in the final member. 
The number of possibilities or columns will be given by the expression: 


The number of members (excluding the identity) in the set of defining contrasts 
is: 


2 — 1. 


Since 


2. 
ima \ 14 
this table will always be a square matrix. It will be seen, as a matter of interest, 
that the number of entries in any row or any column is equal to 2?-'. 

The second step in the proposed systematic method is to select from the pos- 
sible effect assignments the columns to be used, and to assign to each of these 
a single letter corresponding to one of the variables under consideration. In all 
of the present discussion, we are implying as an arbitrary criterion that we will 
not accept any set of defining contrasts containing members of three or fewer 
letters, and that we wish to minimize the number of members containing as 
few as four letters. 

If the number of variables, n, happens to be equal to the number of possible 
effect assignments, then we assign one letter to each of the columns of the ap- 
propriate table. For example, if we were setting up a 2’~* factorial, we would 
assign in Table I the letter A to Column 1, the letter B to Column 2, and so 
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forth. The set of defining contrasts (excluding the identity) would then be read 
directly from Table I, the terms being ADEG, BDFG, CEFG, and so forth. It 
is not possible to construct a better set than this, since if we replace any column 
by another column we will create 2 three-letter elements. 

If the number of variables, n, is greater than the number of possible effect 
assignments, we assign the first 2?—1 letters according to the columns of the 
appropriate table. If the number of letters is one greater, it is immaterial how 
we assign the additional letter. All columns contain an equal number of check 
marks, and the same number of elements in the set of defining contrasts will be 
increased by one letter each no matter which column is selected. If the number 
of letters is two greater, it is immaterial which two columns are chosen. It will 
be seen upon inspection of Tables I, II, and III that any combination of two 
columns will add zero letters, one letter, and two letters the same number of 
times to members of the set of defining contrasts. 

If the number of letters is three greater, the first two columns are assigned 
arbitrarily as discussed above, and the third column is picked to have as many 
check marks as possible in the blank spaces of the first two extra columns. Addi- 
tional letters can be added consecutively, choosing columns so as to eliminate 
further blank spots or to reduce the number of elements in the set of defining 
contrasts having as few as four letters. The use of the tables as described makes 
the process an easy one to carry out by inspection. 

Another possibility is that we may wish a design where the number of letters, 
n, is less than the number of possible effect assignments. For example, we may 
wish to use a 2'*~ factorial. From Table II we see that the problem is to delete 
five columns and assign ten letters to the remaining columns, again with the 
aim of minimizing the number of elements in the set of defining contrasts con- 
taining as few as four letters. By reasoning similar to that used above, it is im- 
material which two columns are deleted first. The third column to be deleted 
is selected to contain as many as possible of its check marks in the blank spaces 
of the first two columns deleted. Similarly, the fourth and succeeding columns 
to be deleted are picked to have as many as possible of their check marks in the 
sparse areas of the columns already deleted. If the number of variables, n, 
is less than half of the number of possible effect assignments, then it would be 
more profitable to select columns to be used rather than columns to be deleted. 
Thus for a 2'!~ design, we would select columns consecutively from Table III. 
The first two columns may be picked arbitrarily; successive columns are then 
selected to reduce blank spots in the first columns, and after all blank spots are 
eliminated to reduce as far as possible the number of elements containing as 
few as four letters. 

As the final step in the proposed systematic method, after the correct number 
of columns has been selected according to the above suggestions, then a single 
letter may be assigned to each of the columns selected. The members of the set 
of defining contrasts may then be read directly from the table of check marks. 

It is immaterial, as far as the design is concerned, which letters are assigned 
to particular columns. Assignment of letters to particular experimental vari- 
ables, however, may be quite important. In designs where certain two-factor 
interactions are mutually confounded, we will wish to assign variables to these 


52 AMERICAN STATISTICAL ASSOCIATION JOURNAL, MARCH 1957 


letters so that the confounding will have a minimum effect on our conclusions, 
Usually some of the two-factor interactions can be dismissed as unimportant 
on logical grounds. Hence, the presence of a moderate amount of mutual con- 
founding of two-factor interactions is not a great disadvantage, because the 
confounded interactions of our design may be assigned to the variables where 
interactions can be neglected. 

It should be pointed out that if a four-factor interaction appears in the set 
of defining contrasts, for example ABCD, all two-factor interactions involving 
these four letters are not lost. For example, the AZ, BG, and other similar inter- 
actions can be estimated. It is only the pairs of two-factor interactions, all four 
of whose letters appear in the four-factor interaction, which are confounded. 
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ON THE INDEPENDENCE OF TESTS OF RANDOMNESS AND 
OTHER HYPOTHESES 


I. Ricuarp SAVAGE 
National Bureau of Standards and Stanford University 


Le the hypothesis of randomness, rank order statistics and symmetric 
statistics are shown to be independent. This fact is of use in testing hy- 
potheses, as is illustrated by examples. 


1, INTRODUCTION 


A common problem in statistical inference is “Do the observations 1, - - - , 
z, come from a population with a prescribed mean value?” When the observa- 
tions are normally distributed, a solution to this problem is to use the “t” test. 
Notice that it is not assumed that the observations represent independently and 
identically distributed random variables. In this paper it will be pointed out 
that for random samples, i.e., independently and identically distributed random 
variables, many of the non-parametric tests of randomness are independent of 
“symmetric” tests of hypotheses. This result can be used to test both the 
“random” and “parametric” parts of a hypothesis with procedures having 
known significant levels. 

As an application one might wish to test the hypothesis that z(t) is an ob- 
servation on a Wiener Process [1]. If the null hypothesis is true, then the quan- 
tities x(i6) —2((i—1)6) (¢=1, -- -, N) form a random sample from a normal 
distribution with mean zero and variance proportional to 6. To test this hy- 
pothesis one must test for both randomness and normality. The test for ran- 
domness would depend on the alternatives of interest; perhaps rank correlation 
[2] would be found suitable. The test of normality could be performed using 
the classical chi-square goodness-of-fit test with one parameter estimated. If 
this were the test program, the rank correlation test and the chi-square test 
would be independent under the null hypothesis. 


2. ONE SAMPLE 


If X;, -- - , Xw are independently and identically distributed random vari- 
ables with a continuous cumulative distribution function, then symmetric 
statistics and rank order statistics are independently distributed. A symmetric 
statistic is a symmetric function! of the observations. A rank order statistic is 
a function of the ranks (Ri, - - - , Rv) where R; is the number of observations, 
(Xi, +--+, Xw), whose magnitude is less than or equal to that of X;. The 
assumption of continuity precludes the occurrence of tied observations. The 
assumption of independence may be replaced by the assumption that the ran- 
dom variables Xi, - + - , Xv have asymmetric cumulative distribution function. 

This is verified as follows. Given a value c of the symmetric statistic f(x) the 
conditional probabilities of the rank orders are equal, i.e., P(Ri=n,---, Rw 
=rv|f(z) =c) =1/N! where (rn, - - - , rw) isa permutation of the first N integers. 


1 The function f(z:, zw) is symmetric if tw) for every permutation of the 
first N integers, (i:, + + + ,in). The statistics 2, s, and ¢ are typical symmetric functions. 
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Since this is true for those values of the symmetric statistic which do not im- 
ply tied observations, the result follows. 

In practice tied observations will occur and the following randomization 
procedure is then useful. If, for instanve, the second and fifth observations both 
equal A, no other observation equals A, and there are 6 observations less than 
A, the procedure would be to choose R;=7 and R;=8 with probability one half 
and to choose R2=8 and Rs=7 with probability one half. 

The following examples show how the principle of this section may be ap- 
plied. 

(1) We wish to test the hypothesis that observations 2, - - - , 2, (made in 
that order) come from a distribution with median zero. The possible alterna- 
tives are that the median is greater than zero, or that the median is shifting 
toward larger values as the successive observations are made. To test this 
hypothesis one could use the sign test [8] and a statistic proposed by Mann 
[5]. Large values of either of these statistics would be used to reject the null 
hypothesis. The sign test statistic is a symmetric statistic and the Mann statis- 
tic is a rank order statistic and thus under the null hypothesis they are inde- 
pendent. Also the sample median could be used as an estimator of the popula- 
tion median. The conditional distribution of this estimator will not depend on 
the value of the Mann statistic if the observations are independently and identi- 
cally distributed. 

In order to make a test of significance exactly at the a level, using these two 
statistics, there are available several techniques. The Fisher x? technique for 
combining tests of significance is one method. Discussions of this method for 
situations where the test statistics have discrete distributions have been pub- 
lished [4, 7, 9]. Another technique is to choose levels of significance a; and az 
for the two tests. Then the probability of rejecting the null hypothesis (when 
true) by at least one of the tests is 1—(1—a,)(1—ay) = a;+a,—asa2. Since a 
and a are at our disposal we can usually pick a combination of them so that 
a +a2—a,02,=a, a desired significant level. (Even if the test statistics were not 
independent using this procedure the true level of significance, a, would satisfy 
the inequalities 

max (a, @) S a S am + a). 


These techniques for combining independent tests can be used in the other 
examples. 

(2) Hotelling and Pabst [2, Sec. 8] consider a situation of the following type. 
KN observations are made and each is given one of K possible classifications. 
A test of the hypothesis that the classifications have equal probability is based 
on the statistic 


where N, is the number of observations given the classification 7. This statistic 
will be distributed approximately as x? with K —1 degrees of freedom. 


INDEPENDENCE OF TESTS OF RANDOMNESS 


A test statistic such as 


K 
r= 


(where R; is the number of N,’s< N,) could be used to test the specific alterna- 
tive that the probabilities form an increasing sequence. Paraphrasing Hotelling 
and Pabst, “The value of x’ is unchanged if the classifications are relabeled in 
any way, whereas r’ depends solely on which of the possible labelings exists. 
Thus the two statistics are independently distributed.” 

The random variables, Ni, ---, Nx have a symmetric distribution (they 
are not independent) and thus the Hotelling-Pabst procedure is an application 
of the results of this paper. It should be noted that the relevant random 
variables are not the classifications of the individuals which are the observables, 
but are the derived quantities Ni, - - - , Nx, the numbers of individuals in the 
various classes. In the Hotelling-Pabst example, ties can occur with positive 
probability and thus randomization may become important. 

Jonckheere [3] has also discussed the Hotelling-Pabst procedure. 

Another example is given in the second paragraph of the introduction. 


8. SEVERAL SAMPLES 


Assume that we have k mutually independent samples from continuous dis- 
tributions, i.e., random variables (Xu, - - 
mutually independent, with common continuous cumu- 
lative distribution function for those having a common first subscript. Then 
statistics which are symmetric (within samples) and statistics which are func- 
tions of the rank orders (within samples) are independently distributed. A 
symmetric statistic now means a symmetric function of the observations within 
a sample, i.e., functions whose value does not depend on the order in which the 
sample values occur (in time). The rank orders are defined as previously. In 
particular, the rank order of X,; is the number of observations in the ith sam- 
ple whose magnitude is less than or equal to X,;. 

The following examples illustrate this principle. 

(1) Assume we have an observation on a stochastic process and wish to test 
the hypothesis that the increments are independent and identically distributed. 
The alternatives we wish to guard against are such things as autocorrelation 
and that x(t) has some marked change at some time ¢,, (say x(t) is displacement 
in the horizontal of an artillery shell and ¢,, is the approximate time at which the 
shell stops rising and starts descending). Let y;=2(tn+715) —2(tm+(i—1)8) 
and i=—(Ni—1),---, Nz. Then the null hypothesis implies that y,’=y,; 
[t=—(Ni—1), ---,0] and y; [i=1, - - - , N2] form two samples of identically 
and independently distributed random variables. We might use the Kolmo- 
goroff goodness-of-fit test [6] to test that the observations from both samples 
come from the same distribution. Let the statistic used be called S. We can test 
for autocorrelation by computing statistics R’ and R which are the rank order 
autocorrelation coefficients from the first and second samples respectively. 
The vector (R’, R) is a rank order statistic and since S is symmetric we have 
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under the null hypothesis that (R’, R) and S are independent. Also, of course, 
under the null hypothesis R’ and R are independent. 

(2) Suppose k litters of animals (N; in the ith litter) are to be used for an 
experiment. Preliminary to the experiment we wish to test that the average 
weights of the litters are the same. The null hypothesis is that all of the weights 
are independent, and are normal with common mean and variance. As alterna- 
tives we shall be concerned with the possibility that there is a birth order 
effect within litters and/or that the litters have different average weights. An 
analysis of variance could be used to test the homogeneity of the means. In 
order to test the birth order effect R,, the rank correlation between weights and 
order of birth [2], could be computed for each of the k litters. The appropriate 
F statistic is symmetric and the rank order correlations are mutually inde- 
pendent. A combined test based on these k+1 independent statistics could then 
be used. 


4. POWIR 


It is hard to describe the power functions of tests based on several statistics 
when the several statistics do not remain independent. When the tests are in- 
dependent even under the alternative hypotheses, it is possible to find the power 
function for each test separately and then find the combined power function 
in some cases. (It is trivial if we combine the test results by the second method 
discussed in Section 2.) 

If the random variables do remain independent and identically distributed, 


then even though the parametric portion of the nvll hypothesis may be false, 
the statistics used to test the “random” and “parametric” parts of the null 
hypothesis do remain independent. 


5. GENERALIZATIONS 


The results of this paper may be extended to the case where the observations 
instead of being real-valued are points in an abstract space. In this case the 
definition of symmetric statistic is not affected. However, rank order statistics 
must be redefined. One way of doing this is to form a real-valued function of 
the observations, and define the rank order statistics in terms of the values of 
this function. If this is done in such a manner that the probability of ties is zero, 
then the results of the paper will be applicable. 
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TABLES FOR BEST LINEAR ESTIMATES BY ORDER STATISTICS OF 
THE PARAMETERS OF SINGLE EXPONENTIAL DISTRIBU- 
TIONS FROM SINGLY AND DOUBLY 
CENSORED SAMPLES* 


A. E. SarHAN AND B. G. GREENBERG 
University of North Carolina 


Tables are provided covering sample sizes up to 10. Interpretations 
are made for the variation of the coefficients and variances of the es- 
timates as the number of censored observations vary. An index of 
experimental efficiency termed “efficiency per unit of waiting time” can 
be constructed with the tabulations provided. It is shown that (1) For 
the one-parameter single exponential distribution, the estimate of ¢ 
obtained from singly censored samples from the left has a smaller vari- 
ance than that based upon singly censored samples from the right, 
provided that the number of missing observations is equal on both sides. 
(2) For the two-parameter single exponential distribution: 

(a) The minimum value can be estimated more efficiently in samples 
censored from the right than in samples censored from the left, 
provided that the number of missing observations is equal; 

(b) the estimate of the standard deviation in singly and doubly cen- 
sored samples does not depend upon the side from which censor- 
ing takes place and depends only on the total number of missing 
observations. 


I. INTRODUCTION 


ENSORED samples are encountered because values of some of the observa- 

tions in a sample are unknown as a result of their occurrence below a 
lower bound (or above an upper limit) imposed by either the observer or the 
measuring process. The values beyond the limits are believed to form a con- 
tinuation of the scale of measurement. 

The censored samples considered here are those in which the total number of 
sample elements is known but measurements on some of them at both extremes 
might be lacking. The censoring procedure might be performed in one of two 
ways, viz. censoring the observation because it falls outside fixed bounds (Type 
I) or censoring a fixed percentage of observations at either end of a sample 
(Type II). 

In experimental biology, a known number of individuals might be exposed 
to an agent and the responses of some fall outside the limits. Thus, if n animals 
are injected with the same dose of antigen and blood samples from each animal 
are tested for antibody response after a period of time, there may be only 
n—r, of the animals with measurable amounts. This means that r; of the ani- 
mals developed the antibody to a level which cannot be measured by the pre- 
vailing technique. 

Of the n items, the smallest r; observations are censored because of fixed 
bounds and it is required, for example, to estimate the population mean and 
standard deviation by using the largest n—r; observations. This sample is 


* Sponsored by the Office of Ordnance Research, U. S. Army. 
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called a singly censored sample from the left and this case was considered by 
Ipsen [8], Cohen [1], Hald [6], Gupta [5] and others [2, 7, 12]. 

Similarly, one may have n items drawn at random from a population and to 
save time and expense, the experiment is discontinued before all items have 
actually developed the phenomenon being observed. Such a decision to cut off 
the experiment is made as soon as the first n—r, experimental units have re- 
sponded and the censoring is based upon a fixed proportion of the observations. 

For example, a biologist may perform an experiment on animals to determine 
the effect of exposure to a drug by noting reaction times. Some animals may 
require an extremely long time to react. The experiment might be stopped when 
a fixed percentage have reacted, i.e., the data based on the smallest n —r, items. 
This sample is termed a singly censored sample from the right. This occurs in 
life testing, incubation periods, and fatigue testing and was considered by 
Halperin [7], Hald [6], Gupta [5], Cohen [1], Epstein [3], and Epstein and 
Sobel [4]. 

Furthermore, the above two situations may occur jointly such that there are 
r; smallest observations in a sample of n items that are missing plus rz largest 
observations that are censored. This is termed a doubly censored sample. For 
example, in certain studies of blood clotting, the speed of the reaction is such 
that r; animals may respond almost spontaneously before individual measure- 
ments can be taken on them whereas some animals barely respond and may re- 
quire an infinite waiting period. In such a situation, censoring on the left is by 
Type I whereas that on the right is by Type IT. 

The case where observations are missing from both extremes is the most 
genera! and the first two illustrations are special instances of it. The common 
situations of censored samples encountered in practice are those which occur 
with samples drawn from either an exponential or a normal distribution [2, 12]. 

The main aim of this work is to provide tables for calculating estimates of 
the mean and standard deviation from doubly censored samples drawn from 
the one- and two-parameter exponential distributions given respectively by 


f(y) = eve, (1.1) 


1 


A second objective is to interpret the foregoing tables by pointing out pat- 
terns and arrangements within the tables. These patterns can help to throw 
light upon the relative worthwhileness of individual observations and to pro- 
vide guidance in designing an experiment most efficiently. 


II. ESTIMATION OF PARAMETERS 


The best estimates of the parameters are given as a linear function of the 
known ordered observations, i.e. the known observations are arranged in as- 
cending order and the best linear combination of them is obtained. These linear 
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estimates are termed linear systematic statistics [9]. These estimates are sim- 
ple, easy to calculate and of high efficiency. 

The general formulae for the estimates, their variances, relative efficiencies 
and examples are presented herein for Type II censoring. The derivation of 
these formulae can be found in [11]. 

Consider a sample of size n with r; smallest missing observations and r; larg- 
est missing observations, and denote by y; the 7th observation in ascending or- 
der of magnitude. 

For the one-parameter exponential distribution, 


ritl 1 


1 
o* = — — (mn — 11) Yat + + yi} (2.3) 


n—-i+l 


ritl 1 2 
~ (n—it+ 
ri+l 1 
(n 1)? 
median* = o* log, 2 (2.5) 


+ — 12 — 1) 


V(median*) = - (log. 2)? (2.6) 


where o* is the best linear estimate of ¢ and median* is the estimate of the popu- 
lation median. 
For the two-parameter exponential distribution, 


ri¢+l 1 ritl 1 
ri¢+l 1 b> 


= ] (2.8) 


Yi — — Yn + 


+l 


where 

and 
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V(o*) = co? 
where 
1 


n—-n—t—1 


c= 
Furthermore, 
1 1 


im 


ritl 1 1 1 


V(mean*) = ——— (n—i 


median* =yu* + o* log, 2 


V(median*) = 55) + 


16> tog, 2 + (log, ay. (2.14) 


Ill. TABLES 


The tables are provided for all possible combinations where the samples are 
of sizes £10 since these values are the ones most commonly encountered. Ex- 
tension of these tables to larger values of n is straightforward. __ 

Table I gives the coefficients (w,’) in the best linear estimate of o in the one- 
parameter single exponential distribution from singly and doubly censored sam- 
ples of sizes < 10 such that 


o* = } 2 wi Yi. (3.1) 
+1 
The coefficients in each row of Table I must be divided by the common divi- 
sor given in the last column. 
This table shows that: 
(1) For a fixed value of 71, the numerator in the coefficient of the largest known 


observation increases as rz increases. In fact it will be increased by the sum of the 
numerators of the coefficients attached to the censored observations. 
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(2) For a fixed value of r;, the numerators of the coefficients of the smallest known 
observations will be equal regardless of the value of rz. 
(3) The numerators of the coefficients of the middle elements are always equal. 


Table II gives the exact variance (in terms of o*) of the estimate of o in the 
same distribution from singly and doubly ce.sored samples of sizes $10 for dif- 
ferent values of r; and re. 

This table shows that: 

(4) The estimate based upon singly censored samples from the left has a smaller 
variance than that based upon singly censored samples from the right when the 
number of missing observations is equal on both sides. 

(5) The denominator of the variance is the common divisor from Table I. The 
numerators in any row (n and r; fixed) are all equal and the same value as the co- 
efficient of the largest known observation indicated in Table I when rz=0 for 
that specific n and r. 


Table III gives the percentage efficiencies of the estimate of o from singly 
and doubly censored samples of sizes £10 relative to the best linear estimate 
based on the complete uncensored sample. By examining the entries in this 
table in a diagonal fashion (i.e., m, (r:+12) fixed), it can be seen that the ef- 
ficiency declines more rapidly when the censoring is from the right. This means 
that the experimenter is sacrificing more precision when censoring is from the 
right. On the other hand, the expected waiting time for the largest observations 
may make this sacrifice a desirable one. 

Table IV gives the proportional reduction in expected waiting time for a 
singly censored sample from the right of size £10 with differing values of re. 

By combining the results from Tables III and IV, a measure termed “effi- 
ciency per unit of waiting time” can be calculated to guide the experimenter 
in determining the advisability of censoring a given sample. From the appear- 
ance of the graph in Figure 1,f the “efficiency per unit of waiting time” is seen to 
decline as more observations from the right go uncensored. This decline is 
somewhat constant until the point r,=4. At this point, the rate of decline is ac- 
celerated and the additional waiting time may be unremunerative. Consequent- 
ly, for samples of size 6 to 10, the observations might be censored profitably 
when 3 or 4 of the largest observations still remain. 

Similarly, a more meaningful measure of experimental efficiency can be con- 
structed if the waiting time can be converted to a cost function. In this event, 
one could calculate “efficiency per unit of cost” where the cost would consist 
not only of the increment for the extra waiting time but also the original ex- 
penses of setting up the experiment. 

Table V provides the exact coefficients (w,;) for the best linear estimate of 
the population value of » in the two-parameter single exponential distribution 
from singly and doubly censored samples. The values of w; are calculated up 
to samples of size 10 with different values of r; and rz such that 


(3.2) 
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All elements in each row should be divided by the proper divisor given in the 
last column. The divisor is given by 


1 1 


(3.3) 


In some instances, the value of the denominator given by (3.3) may not be 
identical with the one in Table V because a common factor was cancelled from 
both the denominator and the numerator. The coefficients (both numerator 
and denominator) in Table V show the following systematic changes as 7, r2, 
and n vary: 


(6) In complete sample estimation (i.e., 7: =r,=0), the numerator of the smallest 
sample element is given by (n +1)(n —1) and all the other elements have numera- 
tor = —1 while the divisor is n (n —1). 

(7) For a fixed r; and as rz increases, the numerators of the coefficients of the largest 
known elements decrease (increase ia absolute value). In fact, the actual value is 
equal to the sum of the numerators which were attached to the censored elements 
plus that of the largest known observation. 

(8) The numerators of the middle elements are always equal. 

(9) For any fixed value of r:, the smallest known element always has a numerator 
which decreases as r; increases by a fixed number that equals the denominator 
for the fixed value of r; and the largest rz possible. In most cases this decrement is 
also equal to the denominator of the previous value of r; and smallest rz. _ 


Table VI is constructed to give the exact coe‘ficients (w2;) in the best linear 
estimate of the population standard deviation o for the two-parameter single 
exponential distribution from singly and doubly censored samples such that 


of = (3.4) 


All coefficients in each row must be divided by the proper divisor for that row 
given in the last column. This divisor is calculate. from 


1 
From Table VI, the following observations can be made: 


(10) The coefficient (wa) of the smallest known element in samples censored only 
from the left (i.e., r2=0) is always = —1, and all the other coefficients are equal 
and have the value of 1/n —r, —1. 

(11) For a fixed value of r:, the coefficients of the smallest known elements are equal 
regardless of the value of rz. 

(12) For fixed values of n and ri, as r:increases the numerator of the largest known 
observation will increase and in fact, will be equal to the sum of the numerators 
of the coefficients attached to the censored elements plus that of the correspond- 
ing element for the largest known o’servation. 

(13) As r: increases, the numerators of the smallest elements increase by a difference 
equal to the increase in r. 

(14) The coefficients of the middle elements are always equal. 


(3.5) 
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Since (mean)* =yu*+o*, the exact coefficients (ws;) are given in Table VII 
for the best linear estimate of the mean in singly and doubly censored samples 
from the two-parameter single exponential distribution such that 


mean* = >> (3.6) 
t=—r)+1 


All coefficients in each row must be divided by the proper divisor given in the 
last column. 
From Table VII the following observations are noted: 


(15) For a fixed value of n and rm, the numerator of the coefficient of the largest 
known element increases as rz increases, and in fact the numerator is equal to the 
sum of these coefficients of the censored elements plus that of the corresponding 
elements for the largest known observation (opposite to that of observation #7 
and same as #12). 

(16) For a fixed ri, as rz increases the smallest known observation has a numerator 
which decreases by a number equal to the last divisor for the fixed r; and largest 
r2. (Same as observation #9.) 

(17) The middle elements always have equal coefficients. (Same as observation #8.) 


Table VIII gives the variances of the best linear estimate of u* in singly and 
doubly censored samples of sizes £10 from the two-parameter single exponen- 
tial distribution for different values of r; and rz in terms of ¢*. The entries in 
Table VIII were based upon the exact fractional values for the variances but 
were converted to decimals to facilitate reading of the table. From Table VIII 
the following points can be noted: 

(18) The variance of the estimate increases as rz increases for a fixed value of r: and 

vice versa. When the number of censored observations is equal on both sides the 
variance of the estimate of u is less for singly censored samples from the right 
than that obtained from singly censored samples from the left. 
For a fixed n and n, the variance of the estimate of » does not undergo much 
increase as rs increases except for the last possible value of rz. In other words, 
the variance of the estimte of u« is roughly independent of rz provided that r; is 
small. 


Table IX gives the percentage efficiencies of the best linear estimate of u* in 
singly and doubly censored samples relative to uncensored samples from the 
two-parameter exponential distribution for n <10 and all possible values of r; 
and Te. 

From Table IX, the following is noted: 

(20) Reading this table in diagonal fashion such that (ri:+r2) is fixed for a given n, 
it can be seen that the efficiency increases as the sample is censored from the right 
rather than the left. This is directly opposite to the situation encountered in 
estimating o in the one-parameter exponential distribution. 


Tables X and XI are constructed to show the exact variance and relative ef- 
ficiencies of the estimate of the population standard deviation, c, for the same 
sample size and different values of r; and r; from the same distribution. 


(21) Tables X and XI show that the variances and relative efficiencies of the es- 
timate of the population standard deviation are independent of the side from 
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which censoring takes place. The values in both tables show that the effect of 
censoring depends only upon the total number of missing observations for any 
given n. 


Tables XII gives the variances of the estimate of the population mean for 
censored samples of sizes $10 and with varying values of r, and rz. This table 
is expressed in seven decimal places although the values were calculated ex- 
actly as fractions which are available from the authors. 


(22) Table XII shows that the variance of the estimate of the mean increases more 
rapidly with censoring from the right than it does with censoring from the left. 
This is true in general but for n 27, the reverse is true for the largest possible 
value of r; or rs. 


Table XIII presents the relative efficiencies of the estimate of the popula- 
tion mean for the same distribution with different degrees of censoring. 


“Iv. EXAMPLE 


The data for this example are part of an experiment! in which ten rabbits 
were inoculated with 0.2 ml of graded inoculum containing varying numbers of 
treponema pallidum. Each rabbit received six injections from solutions contain- 
ing 101, 10?, 10%, 10¢, 10° and 10* spirochetes per ml, and was then observed for 
a period of 90 days to observe whether a syphilitic lesion developed at the site 
of injection. 

The incubation time required for a lesion to appear is an index of the amount 
and potency of the inoculation as well as the susceptibility of the individual 
rabbit. The distribution of incubatiou periods follows the two-parameter ex- 
ponential distribution. 

Knowledge of the reaction mechanism in rabbits has indicated that censor- 
ing the observations at 90 days after inoculation is desirable because only an 
infinitesimal proportion of rabbits will have an incubation period beyond that. 
In fact, the present data will be considered censored at one-half that period, 
viz. 45 days. Experience has also showed that with data of this type the re- 
ciprocal transformation not only tends to stabilize the variance but also to 
make the relationships additive [13]. In other words, the harmonic mean is 
calculated as the measure of central tendency. Censoring from the right pre- 
sents no problem under this transformation since an inoculation site which does 
not develop into a lesion is considered to represent an infinite incubation period. 

During the experiment, the rabbits are examined about twice a week for le- 
sions. Those lesions which develop in the interim between examinations are 
undetected until the next period. Lesions which are one or two days old at 
time of first observation can be distinguished by their greater size. The first 
examination is performed approximately one week following inoculation. This 
results in the fact that certain observations can be considered censored from 
the left if the size of the lesion is large at the first examination period. 


1 The authors would like to thank Dr. Harold J. Magnuson, Venereal Disease Experimental Laboratory, United 
States Public Health Service, for permission to use these data from Experiment #30. 
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The data from one portion of this experiment are presented in the table 
below: 


DAYS OF INCUBATION AMONG TEN RABBITS FOLLOWING INOCULATION 
WITH GRADED AMOUNTS OF TREPONEMA PALLIDUM 


Inoculum 
Rabbit number 


>45 
40 
>45 


(mean)* 
Harmonic mean A 
Harmonic mean B 


50.51 
56.74 
56.21 
18.17 


Vv AV A 
| 


There is a slight deviation from the original data in one respect. For illustra- 


tive purposes of censoring from the left, it has been assumed that some of the 
rabbits had lesions large enough at the time of first examination to presume that 
the true incubation period ended a few days earlier. For example, note that when 
10° inoculum was used, rabbits 7 and 9 were considered to have lesions large 
enough to presume that the incubation was less than 7 days. In the table pre- 
sented above, the values of (mean)* for each inoculum have been calculated 
using the coefficients ws; from Table VII.* For comparative purposes, two har- 
monic means have been indicated below the values for (mean)*. Harmonic 
mean A has been calculated under the assumption that there was no censoring 
from the left. For example, rabbits 7 and 9 were assigned an incubation period 
of exactly 7 days for the 10* inoculum. The harmonic mean B has been calcu- 
lated under the assumption that there was censoring from the left so that the 
incubation period for those rabbits at that dose was one day less. 

One can see from the table that the harmonic means did quite respectably 
although not as well as the exact estimate. As expected, the harmonic means 
are slightly lower than the arithmetic mean until that dose is reached for which 
a fair proportion of animals do not develop any lesion. By assigning those ani- 
mals an infinite incubation period, the result is to pull even the harmonic mean 
above the other mean. 

The reciprocal transformation is of little value, however, if the earliest incu- 


2 The type of censoring practiced in this example was of Type I whereas the coefficients used to estimate the 
parameters were based upon the assumption of Type II censoring. Sampford {10} indicates that the possible bias 
caused by this factor is negligible and of no practical import. Sampling investigations conducted by the present 
authors have confirmed the relative unimportance of this potential bias. 


| 108 105 104 108 10? 10! 
7 <7 <1l 18 <18 >45 
8 11 11 18 18 >45 
9 <7 <i1l 14 >45 >45 
10 7 il 18 <18 <25 
11 11 14 18 25 25 
12 14 14 18 21 25 
13 7 11 18 18 32 
14 >45 35 40 * 25 
15 7 14 18 25 a 
16 11 14 18 21 
9.88 14.41 19.80 21.76 10 
9.45 13.01 18.49 22.86 34 
9.04 12.71 18.49 22.52 93 
5.54 9.27 13.36 13.80 90 
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bation day which is represented by u* is to be estimated. The values of u* for 
each dose are given in the last row of the table using the coefficients w,; from 
Table V. The relationship of u* to the mean value is evident and this throws 
further light upon the speed of the reaction. 
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(Tables follow on pp. 68-86; Figure 1 is given on p. 87.) 
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TABLE I 
COEFFICIENTS (w,’) IN THE BEST LINEAR ESTIMATE OF e¢ FOR THE 


SINGLY AND DOUBLY CENSORED SAMPLES 
(The coefficients in each row must be divided by the common divisor given 


ONE PARAMETER SINGLE EXPONENTIAL DISTRIBUTION FROM 


in the last column of each row) 


SOCOM 


~ 


BRR 


| 
| 
| 68 1957 
3001 1 1 3 
2 2 
1 0 17 13 38 
1 1 1 4 
1 2 3 
‘3 2 3 2 
1 0 34 25 25 99 
2 34 50 74 
2 0 95 61 230 
5 1 1 1 1 1 5 
1 1 1 2 4 
1 1 3 3 
1 4 2 
f 57 41 41 41 204 
57 41 82 163 
57 123 122 
1,282 769 769 ‘ 747 
1,282 1,538 ,978 
2,951 1,669 
6 1 1 1 1 1 6 
1 1 1 2 5 
1 1 3 4 
1 4 3 
5 2 
86 61 61 61 61 365 
86 61 61 122 304 
86 61 183 243 
86244 182 
813 469 469 469 2,776 
| 813 469 938 2,307 
813 1,407 1,838 
1,682 869 869 4,987 
1,682 1,738 4,118 
4 3,451 1,769 9,338 
| 1 1 1 1 1 1 1 7 
1 1 1 1 1 2 6 
1 1 1 1 3 5 
1 1 1 4 4 
1 1 5 3 
1 6 2 
121 85 85 85 85 85 594 
121 85 85 85 170 509 
121 85 85 255 424 
121 85 340 339 
121 425 254 
6,914 3,889 $3,889 3,889 3,889 27,005 
6,914 3,889 3,889 7,778 23,116 
6,914 3,889 11,667 19,227 
6,914 15,556 15,338 
54,237 26,581 26,581 26,58! 181,504 
54,237 581 53,162 154,923 
54,237 79,743 128,342 
100,418 46,181 46,181 303 ,043 
100,418 92,362 256 , 862 
190,699 90,281 537,842 
~ 1 1 1 1 1 1 1 1 ~ 
1 1 1 1 1 1 2 7 
1 1 1 1 1 3 6 
1 1 1 1 4 5 
1 1 1 5 4 
1 1 6 3 
1 7 2 
162 113 113 113 113 113 113 903 
162 113 113 113 113 226 790 
162 113 113 113 339 677 
162 113 113 452 564 
162 113 565 
162 678 
3,259 1, 1, 1,901 1,801 1,801 
3,259 1, 4, 1,801 3,602 
3,259 1, 1, 5,403 
3,259 1, 
3,259 9, 
154, 73, 73,249 73,249 73,249 
154, 73, 73,249 146,498 
154, 73, 219,747 
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1 
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2 
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3 
3 
3 
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4 
4 
4 
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5 
5 
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7 
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0 
0 
0 
0 
0 
0 
0 
0 
1 
1 
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1 
1 
1 
1 
1 
2 
2 
2 
2 
2 
2 
2 
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3 
3 
3 
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5 
5 
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6 
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7 
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TABLE II 


EXACT VARIANCES OF THE ESTIMATE OF o FOR THE ONE PARAMETER 
SINGLE EXPONENTIAL DISTRIBUTION FROM SINGLY AND 
DOUBLY CENSORED SAMPLES (IN TERMS OF o?) 


1 2 3 
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61 
243 
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n 0 ee 4 5 6 7 8 
1 ~ 
3 0 - 
1 
4 0 
1 
2 
1 
5 0 pal 
2 
1 
2 
3 
0 
5 3 2 
| 61 61 
182 
2 
3 
9,338 
1 1 1 1 1 
7 0 - - - - - 
6 5 4 3 2 
3 85 85 85 85 
509 424 339 254 
3,889 3,889 3,889 3,889 
27 ,005 23,116 19,227 15,338 
3 26 ,581 26,581 26 ,581 
181,504 154 ,923 128 ,342 
‘ 46,181 46,181 
303 ,043 256 , 862 
90,281 
537 ,842 | 
1 1 1 1 1 1 1 
8 7 6 5 4 3 2 : 
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113 113 
903 790 
1,801 1,801 
14,334 12 ,533 
73 ,249 73 ,249 
577 ,085 503 , 836 
117 ,349 117 ,349 
904 ,096 786,747 
195 ,749 195 ,749 
1,242,278 


1,159 
12,289 
110,215 97 ,926 
19,345 19,345 
172 ,350 153 ,005 
737 ,641 737 ,641 
6,481,205 5,743,564 5,005,923 
1,134,541 1,134,541 1,134,541 
9,698,704 8,564,163 7,429,622 
1,840,141 1,840,141 
14,896,083 13,055,942 
3,427,741 
24,670 ,622 


1 1 


1 1 
10 9 7 6 


181 181 181 181 18i 


1,809 1,628 1,447 1,266 1,085 
4,921 4,921 4,921 4,991 4,921 
49,088 44,167 «30,246 34,325 404 
870,720 870,729 870,729 370,729 370,729 

3,681,223 3,310,494 2,939,765 2,569,036 2,198,307 
547,129 547,129 547,120 «547,129 547,120 

5,382,774 4,835,645 4,288,516 3,741,387 3,194,258 
801,145 801,145 801,145 145 

7,745,741 6,944,508 6,143,451 5,342,306 

1,198,045 1,198,045 1,198,045 

11,217,256 10,019,211 8,821,166 

1,903,645 1,903,645 

16,774,491 14,870,846 

8,401,245 

27,120 ,566 


" = 0 1 2 3 4 5 6 7 8 
113 113 113 113 
677 564 451 338 
P 1,801 1,801 1,801 
10,732 8,931 7,130 
73,249 73,249 
430,587 357,338 
117,349 
669,398 
5 
1, 
372,149 
2,454,308 
1 1 1 1 1 1 “eat 
9 8 7 6 5 4 ae 
145 145 145 145 
869 724 579 434 
12,289 12,289 12,289 
73,348 61,059 48,770 
3 19,345 193454 
114,315 94,970 
737 ,641 
4,268,282 
5 
6 
7 
1 
5 
181 181 181 
904 
41921 4,921 
24,483 19,562 
370,729 
1,827 ,578 
4 
5 
6 
7 
8 
| 
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TABLE III 


PERCENTAGE EFFICIENCIES OF THE ESTIMATE OF ¢ IN THE ONE-PA- 
RAMETER SINGLE EXPONENTIAL DISTRIBUTION FROM SINGLY 
AND DOUBLY CENSORED SAMPLES 


The efficiencies are calculated relative to the best linear estimate 
based on the complete sample 


T: 


4 


Ss 


28 use £8 


ess 


onro 


SERRE 


0 
1 
2 
3 
4 
5 


n 
a 1 2 3 a 5 6 7 8 
3 0 66.67 
: 
4 0 75.00 50.00 
1 74.00 
5 80.00 60.06 40.00 
79.51 59.51 
97.45 77.45 
91.05 
6 100.00 83.37 66.67 50.00 33.33 
99.73 83.06 66.39 49.73 
98.65 81.98 65.32 
95.65 78.98 
87.98 
7 100 .00 71.438 57.14 42.86 28.57 
99.83 71.26 56.97 42.69 
99.19 70.63 56.34 
97.55 68 .98 
93.75 
85.11 
8 75.00 62.50 50.00 37.50 25.00 
74.89 62.39 49.89 37.39 
74.49 61.99 49.49 
73.48 60.98 
71.30 
9 0 100.00 88.89 77.78 66.67 55.56 44.44 33.33 22.22 
1 99.92 88.81 77.70 66.59 55.48 44.37 33.25 
2 99.65 88.54 77.43 66.32 55.21 44.10 
3 98.99 87.88 76.77 65.66 54.55 
4 97.63 86.51 75.41 64.31 
5 94.98 83.89 72.77 
6 89.95 78.88 
7 80.01 
| 


TABLES FOR ESTIMATES FROM CENSORED SAMPLES 
TABLE III—(continued) 


© 
SSexzeses 


TABLE IV 


PROPORTIONAL REDUCTION IN EXPECTED WAITING TIME TO OBSERVE 
THE FIRST n-—r: FAILURES IN SINGLY CENSORED SAMPLES DRAWN 
FROM THE ONE-PARAMETER SINGLE EXPONENTIAL DISTRIBUTION 


7381 7381 


n T1 
0 | 2 3 4 5 6 7 8 
: 10 80.00 70.00 60.00 50.00 40.00 30.00 20.00 
79.94 69.94 59.94 49.94 39.94 29.94 
79.75 69.75 59.75 49.75 39.75 
: 79.30 69.30 59.30 49.30 
78.38 68.38 58.38 
76.68 66.68 
73.63 
0 1 2 3 4 5 6 7 8 9 
2 1 
3 
ll 1l 
13 7 3 
4 
25 25 25 
1387 187 187 187 
147 147 147 147 147 
869 459 319 214 130 60 
1089 1089 1089 1089 1089 1089 
8 1 1443 1023 743 533 365 225 = 105 
2283 2283 2283 2283 2283 2283 2283 
9 1 4609 3349 2509 1879 1375 955 595 280 
7129 7129 7129 7129 7129 7129 7129 7129 
10 1 4861 3601 2761 2131 1627 1207 847 532 252 
| 7381 7381 7381 7381 7381 7381 7381 [7 
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TABLE V 


THE EXACT COEFFICIENTS (wi) IN THE BEST LINEAR ESTIMATE OF yu 
FOR THE TWO-PARAMETER SINGLE EXPONENTIAL DISTRIBUTION 
FROM SINGLY AND DOUBLY CENSORED SAMPLES 


(The coefficients in each row must be divided by the common divisor 
given in the last column of each row) 


ys ye ys 


gest 


0 
0 
1 
0 
0 
0 
1 
1 
2 
0 
0 
0 
0 
1 
1 
1 
2 
2 
3 
0 
0 
0 
0 
0 
1 
1 
1 
1 
2 
2 
2 
3 
3 
4 
0 
0 
0 
0 
0 
0 
1 
1 
1 
1 
1 
2 
2 
2 
2 
3 
3 
3 
4 
4 
5 
0 
0 
0 
0 
0 
0 
0 
1 
1 


OS CHOWK OWN KOCH OS SCHON OCH 
all 
SSESE 


3 
4 
5 
7 
1 
| 
1 
1 
2 
2 
2 
2 
2 
3 
3 
3 
ij 


H 
: 


SSRSSSSRA 


= 


| 75 
4 0 4,749 -—743 -—743 
3,069 —2,229 
5 0 3,726 —1,023 —1,023 
2,886 —2,046 
6 0 2,283 1,443 
-1 -1 -1 -1 -1 -1 
-1 -1 -1 -1 -1 -1 
-1 -1 -1 -3 
-1 -1 -1 -1 -4 
-1 -1 -1 -5 
-1 -1 
-7 
6232-17 -17 -17 -17 -17 -17 
551 —17 -17 -17 -17 -17 —34 
479-17 -17 -17 -17 --51 
4070-17 -17 -17 —68 
3350-17 -17 —85 
2634 8-102 
191 =—119 
3,666 —191 —191 —191 
3,162 —191 —191 —573 
2,658 —191 —191 
2,154 —191 
1,650 —1,146 
3,895 -275 —275 
3,391 -275  —550 
2,887 
2,383 —275 —1,100 
1,879 —1,375 
17,596 —1,879 1,879 -—1,879 1,879 
15,076 —1,879 1,879 —3,758 
12,556 —1,879 —5,637 
. 10,036 —7,516 
15,087 —2,509 —2,509 —2,509 
12,567 —2,509 —5,018 
10,047 —7,527 
11,738 3,349 —3,349 
9,218 —6,608 
7,129 —4,609 
-1 -1 -1 -1 -1 -1 
-1 -1 -1 -1 -3 
-1 -1 -1 -1 -1 a4 
-1 -1 -1 —5 
-1 -1 -1 -6 
-1 -1 -7 
a.) 
872 —19 —19 —19 —19 —19 —19 —19 
7822-19 —19 -19 -19 —38 
602 —19 —19 —19 —57 
602 —-19 —19 -19 —19 
512 —19 -19 -19 —95 
—19 -19 
332-190 —133 
242 —152 
3,367 —121 121 — 121 
3,007 —121 —121 -121 -121 —121 
2,647 —121 —-121 
1,927 —121 —121 
1,567 —726 
1,207 —847 360 
22,362 —1,207 1,207 -—1,207 —1,207 -—1,207 1,207 15,120 . 
19,842 —1,207  —1,207 —1,207 1,207 —2,414 12,600 
17,322 —1,207 —1,207 —1,207 —3,621 10,080 
14,802 1,207 1,207 —4,828 7,560 
12,282 —1,207 —6,035 5,040 
9,762 —7,242 2,520 
: 20;735 1,627 —1,627 1,627 1,627 —1,627 12,600 
18,215 1,627 —1,627 --1,627 —3,254 10,080 
15,695 1,627 —1,627 —4,881 7,560 
13,175 —1,627 —6,508 5,040 
10,655 —8,135 2,520 
18,604 —2,131 2,131 -—2,131 2,131 10,080 
16,084 —2,131 —2,131 —4,262 7,560 
13,564 —2,131 —6,393 5,040 
11,044 —8,524 2,520 
15,843 2,761 2,761 —2,761 7,560 
13,323 —2,761 —5,522 5,040 
10,803 —8,283 2,520 
12,242 —3,601 —3,601 5,040 
7 9,722 —7,202 2,520 
8 7,381 —4,861 2,520 


TABLE VI 
THE EXACT COEFFICIENTS (wa) IN THE BEST LINEAR ESTIMATE OF 


a 
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i=] 
3 
§ 
<3 
es 
A; 
~ 3 
as 
<3 
a 
2 
F 


Z 
— 
R 
= 
=) 
= 
~ 
Z 
Z 
© 
R 


= 
_ 
= 
< 
< 
a 

b 
Z 
~ 
> 
= 
font 
< 
Z 
< 
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n Ts yo Ys Ya Ys Ye Yr Ys Yo Yo Div. 
3 0 -2 1 1 
0 1 —2 2 
1 0 1 
4 0 -—3 1 1 1 
1 -—3 1 2 
2 -3 3 
0 -2 1 1 
1 -—2 2 
0 -1 1 
5 —4 1 1 1 1 
4 1 1 2 
1 3 
—4 4 
-3 1 1 1 
-3 1 2 
-3 3 
—2 1 1 
2 
-1 1 
6 -—65 1 1 1 1 1 
-5 1 1 1 2 
1 1 3 
-5 1 4 
5 
—4 1 1 1 
1 2 
1 
4. 
-3 1 
-3 
-3 
1 
1 
7 -—6 1 1 1 1 1 1 
—-6 1 1 1 1 2 
-6 1 1 1 3 
-6 1 1 4 
-6 1 5 
6 
-5 1 1 1 1 1 
—5 1 1 1 2 
-5 1 1 3 
1 4 
5 
—-4 1 1 1 1 
—-4 1 1 2 
—4 1 3 
4 
-3 1 1 1 
-3 1 2 
-3 3 
1 1 
—2 2 
-1 1 


a 
a 


E 
< 
< 


OD 


HNN NAAN 09 09 09 HID INS 


mN 


© 0 © G0 00 


77 
n "1 Ys Ye Yr Ys ye wo Div. 
8 —7 1 1 1 1 1 1 1 
—-7 1 1 1 1 1 2 
-7 1 1 1 1 3 
—7 1 1 1 4 
-7 1 1 5 
1 
—6 1 
—6 1 
—6 1 
6 
1 1 
—5 2 
—5 
—5 
-5 
—4 1 1 1 1 
—4 1 1 2 
—4 1 3 
—4 4 
—3 1 1 1 
—3 1 2 
-3 3 
-2 1 1 
2 
-1 1 
9 1 1 1 1 1 1 1 1 
1 1 1 1 1 1 2 
1 1 1 1 ae 
1 1 1 1 4 
1 1 1 5 
1 1 6 
1 7 
8 
-7 1 1 1 1 1 1 1 
-7 1 1 1 1 1 2 
—7 1 1 1 1 3 
-7 1 1 1 4 
-7 1 1 5 
-7 1 6 
7 
—6 1 1 1 1 1 1 
—6 1 1 1 1 2 
—6 1 1 1 3 
—6 1 1 4 
-6 6 
-—§ 1 1 1 1 1 
-§ 1 1 1 2 
—§ 1 1 3 
—-5 1 4 
6 
1 1 
1 
—4 1 
4 
-3 1 
—3 
- 1 
-1 1 


TABLE VI—(continued) 


é 
3 


SHAD 


78 
n nef wh Ww YW Yr Ys Me Div. 
| | | | | | 
2 
-8 1 1 1 1 1 1 1 1 
-8 1 1 1 1 1 1 2 
-8 1 1 1 1 1 3 
—8 1 1 1 1 4 
-8 1 1 1 5 
} -8 1 1 6 
-8 1 7 
8 
-7 1 1 1 1 1 1 1 
-7 1 1 1 1 1 2 
-7 1 1 1 1 3 
—-7 1 1 1 4 
-7 1 1 5 
-7 1 6 
-7 7 
—6 1 1 1 1 1 1 
—6 1 1 1 1 2 
—6 1 1 1 3 
—6 1 1 4 
—6 1 5 
~6 6 
-5 1 1 1 1 1 
-5 1 1 1 2 
—-5 1 1 3 
-5 1 4 
-5 5 
-4 1 1 1 1 
—4 1 1 2 
—4 1 3 
4 
—3 1 1 1 
-3 1 2 
3 
1 1 
-2 2 
-1 1 


& 
° 
= 
= 
= 
< 
Zz 
= 
FE 
& 
Zz 
_ 
o 
fe 
o 


an 


CENSORED SAMPLES 
(The coefficients in each row must be divided by the common divisor 


given in the last column of each row) 


ii 


DISTRIBUTION FROM SINGLY AND DOUBLY 


iT? 


: 
= 
= 
= 
< 
= 
< 
= 
< 
= 
= 
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3 0 O 2 2 2 
4 
1 0 5 1 
3 3 3 3 
3 6 
9 
4 5 5 
2 10 
4 4 
u 
22 
13 13 
26 
5 5 5 
5 10 
15 
19 19 19 
19 38 
57 
23 23 23 
23 6 
69 
114 3 3 
54 a 
6 
& 
| 
41 41 
: 41 41 82 
164 
95 95 95 95 
95 % 190 
/ 9 «285 
380 
307 
307 


3 
z 
3 
ia 


TABLE VIiI—(continued) 


a 


#826 


= 


RRRRRS 
3 
~ 


80 
452 307 921 ,680 
—388 1,228 R40 
2,229 97 97 97 ,520 
1,389 97 194 , 680 
549 291 840 
2,046 —183 —183 ,680 
1,206 —366 840 
1,443 —603 840 
g 8 8 8 8 8 8 8 8 8 72 
-1 8 8 8 8 16 63 
—10 8 8 8 24 54 
—19 8 8 8 8 32 45 
—28 8 8 8 40 36 
—37 8 8 48 27 
—46 8 56 18 
: —55 64 
119 55 55 55 55 55 55 55 504 
47 55 55 55 55 55 110 432 
—25 55 55 55 55 165 360 
-97 55 55 55 220 ’ 288 
—169 55 55 275 216 
—241 55 330 144 
—313 385 72 
1,146 313 313 313 313 313 313 4 
642 313 313 313 313 626 
138 313 313 313 939 16 
—366 313 313-1, 252 12 
—870 3131, 565 8 
—1,374 1,878 
1,375 229 229 229 229 229 
871 229 229 229 458 16 
367 229 229 687 12 
—137 229 916 8 
—641 = 1,145 4 
7,516 641 641 641 641 
4,996 641 641 1,282 
2,476 641 1,923 
—44 564 
7,527 ll ll ll 
5,007 11 22 
2,487 33 
6,698 —829 —829 
4,178 —1,658 
4,609 —2,089 
10 9 4 9 a 9 90 
-1 9 9 9 9 9 9 18 80 
9 9 9 27 70 
—21 9 9 36 60 
—31 a 9 a 45 50 
—41 9 9 54 40 
9 63 30 
-61 9 20 
-71 81 10 
152 71 71 71 71 71 71 720 
62 71 71 71 71 142 630 
—28 71 71 71 213 540 
’ —118 71 71 284 450 
—208 71 355 360 
—298 426 270 
—388 180 
—478 90 
239 239 239 2,520 
239 478 2,160 
717 1,800 
t 1,440 
1,080 
720 
—1,313 1, 360 
: 1,313 1,313 1,313 15,120 
4,722 3 1,313 2,626 12,600 
2,202 3 3,939 10,080 
—318 3 7,560 
—2,838 3 5,040 
—5,358 8 27520 
: 893 893 893 12,600 ; 
893 1,786 10,080 
5 7,560 
5 5,040 
- 2,520 
389 389 10,080 
778 7,560 
67 5,040 
2,520 
—241 —241 —241 7,560 
—241 —482 5,040 
3,243 —723 2,520 
7,202 —1,081 —1,081 5,040 
4,682 —2,162 2,520 
4,861 —2,341 2,520 
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TABLE VIII 


VARIANCES OF THE BEST LINEAR ESTIMATE OF » FOR THE TWO-PARAM- 
ETER SINGLE EXPONENTIAL DISTRIBUTION FROM SINGLY AND 
DOUBLY CENSORED SAMPLES (IN TERMS OF oe?) 


1 4 


.2222222 


- oO 


-0937500 
-5138889 


-0533333 
-2037500 
8411111 


-0347222 
- 1125926 
-3204167 
-1161111 


-©244898 
-0721372 
-1747241 
-4391241 
-4561338 


-0178571 .0182292 .0187500 

-0479911 .0503827 .0539700 

-1015781  .1110137 .1267479 
-2380178 .3051212 
-5575021 


-O111111 

-0279167 

-0541093 
-0966139 . 1157315 
-1605255 . -2251048 
-3049312 -4837058 
- 5887952 

1.3207429 2.3417180 
4.2706863 


0 
1 
2 
3 
0 
1 
2 
3 
4 
0 
1 
2 
3 
4 
5 
0 
1 
2 
3 
4 
5 
6 
0 
1 
2 
3 
4 
5 
6 
7 
0 
1 
2 
3 
4 
5 
6 
7 
8 


0 5 6 7 
1 0555556 
4 
.3437500 
1.597222 
5 0500000 .0600000 .0800000 
5204167 
2.105556 
6 0333333 0370370 .0416667 0555556 
1013889 1350000 .2022222 
. 2570370 .5105556 
6926389 1 
2.5038889 
7 0238095 0255102 .0272109 .0306122 .0408163 
0673469 .0801209 .0960884 .1439909 
. 1530896 .2179932  .8478005 
2425208 7275624 
7280669 1 
3 .0489900 
8 0195313 0208333 .0234375 0312500 
.0599490 .0719060 . 1077806 
1582164 .2526219 
5064314 
.9565101 1.7605981 
3.4159552 
9 .0138889 .0141093 .0144033 .0148148 0154321 .0164609 .0185185 0246914 
.0359347 0372621 .0391204 .0419078 .0465535 0558449 .0837191 
0723150 .0771022 .0842831 .0962512 .1201873 .1919958 
.1357001 1505860 .1753058 .2250154 .3738741 
.2551495 3014804 .3941423 .6721281 
6090863 .6743011 1. 1699455 
1.1728460 2.0550244 
3.884026. 
10 .0116667 0120000 .0125000 .0133333 .0150000 .0200000 
.0312593 .0334877 .0372016 .0446296 0669136 
.0662133 0756276 .0944560 .1509414 
1348490 .1730841 .2877804 
.2945788 .5030011 
8412550 


xs 
888 
ake 


gases 
RRR 


SSRSIKLS 


90.00 84.38 75.00 56.25 
7.23 


29.83 24.87 16.59 


85.71 76.19 57.14 


11.56 
3.71 


TABLE Ix 
PERCENTAGE EFFICIENCIES OF u* RELATIVE TO THE BEST LINEAR ES- 
97.22 93.33 87.50 77.78 58.33 


33.01 29.72 24.78 16.54 
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100.00 75.00 
15.79 


® 


= 
< 
n 
Q 
o 
fea) 
=) 
a 
2 
< 
= 
x 
Zz 
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i=) 


=) 
— 
Qa 
= 
vA 
a 
Z 
= 
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} 
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1 5.35 
2 15.55 13.63 10.92 6.85 
3 6.84 5.42 3.27 
4 3.27 1.64 
5 -78 
8 97.96 95.24 91.43 yy 
35.44 33.09 29.79 24.83 16.57 
16.09 14.09 11.29 7.07 
5.50 5.35 3.53 
3.20 1.88 
1.01 
9 98 96 
35 
16 
7 
3 
1 
|_| 97 55.56 
37 
18 
9 
4 
2 
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TABLE X 


EXACT VARIANCES OF THE ESTIMATE OF STANDARD DEVIATION eo IN 
THE TWO-PARAMETER SINGLE EXPONENTIAL DISTRIBUTION FROM 
SINGLY AND DOUBLY CENSORED SAMPLES (IN TERMS OF eo?) 


T: 


4 


0 
1 
0 
1 
2 
0 
1 
2 
3 
0 
1 
2 
3 
4 
0 
1 
2 
3 
4 
5 
0 
1 
2 
3 
4 
5 
6 
0 
1 
2 
3 
4 
5 
6 
7 
0 
1 
2 
3 
4 
5 
6 
7 
8 


0 1 2 3 : 5 6 7 8 
3 
1 
4 
1 
5 12 1A 
1/3 1/72 1 
A 
6 1/5 1/4 178 1 
14 178 1/72 1 
172 1 
1 
7 16 1/6 1/8 172 1 
1/5 1/4 1/8 1/2 1 
1/73 1/72 1 
1/72 1 
1 
8 1/7? 1/6 1/5 1/4 1/78 172 
1/6 1/5 1/4 1/8 1/72 1 
1/5 1/78 1/72 1 
1A 
1 
9 1/77 1/76 1/6 14 1/8 1/72 1 | 
1/7 1/6 1/5 1/4 1/8 1/72 
16 1/6 1/4 1/8 1/2 1 
1/5 174 1/8 12 1 
1/4 178 1/2 1 
1/73 1/2 1 
1 
10 19 1/8 1/77 16 1/6 1/4 178 
1/8 1/5 1/4 1/78 1 
1/6 138 12 
1/5 1/4 1/8 1/2 
1 


TABLE XI 
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2 
Fas 
o 
Aes 
<2 3 
Has 
BO 
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© 
66 
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} 
x 

< 
= 
x 
1S) 
= 
= 
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a 
=| 
3 
2 
a 
= 
6 
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100.00 66.67 33.33 
66.67 33.33 
33 .33 


0 
1 
2 


S85 


42.86 28.57 14.29 


85.71 71.43 57.14 42.86 28.57 14.29 
71.43 57.14 42.86 28.57 14.29 


57.14 42.86 28.57 14.29 


42.86 28.57 14.29 
28.57 14.29 
14.29 


100.00 85.71 71.43 57.14 


0 
1 
2 
3 
4 
5 
6 


as 
883 


88883 


oN oN 
rs ON 


SESSERS 


N N 


SSR 


Cr NOH 


n T1 
0 1 2 3 4 5 6 7 8 
3 0 100.00 50.00 
1 50.00 
5 0 100.00 75.00 50.00 25.00 
1 75.00 50.00 25.00 
2 50.00 25.00 
3 25.00 
6 0 100.00 80.00 60.00 40.00 20.00 
1 80.00 60.00 40.00 20.00 
2 60.00 40.00 20.00 
3 40.00 20.00 
4 20.00 
7 1 16.67 
1 
2 
3 
4 
5 
8 | 
9 100 12.50 
8 
7 
6 
3 
2 
1 
10 78 66.67 55.56 44.44 33.33 22.22 11.11 
67 55.56 44.44 33.33 22.22 11.11 
56 44.44 33.33 22.22 11.11 
44 33.33 22.22 11.11 
33 22.22 11.11 
| 22 11.11 
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TABLE XII 


VARIANCES OF THE ESTIMATE OF THE MEAN FOR THE TWO-PARAMETER 
SINGLE EXPONENTIAL DISTRIBUTION FROM SINGLY AND 
DOUBLY CENSORED SAMPLES (IN TERMS OF o?*) 


© 


0 
1 
2 
3 
4 
5 
6 


-1111111 
- 1113316 
- 1126589 
1174461 
- 1323320 
1786630 
-3438778 
- 2269561 


© 


1036245 
-1112715 
1321138 
- 1939634 
-8954846 

1.4221381 


@Snoarhwnro 


n 
0 1 2 3 4 5 6 7 8 
3 0 .5555556 
1.388880 
4 .2500000 .3437500 6250000 
.2604167 .3472222 
4305556 
5 .2000000 .2533333 .3600000 .6800000 
.2033333 2537500 .4050000 
2870833 .2605556 
5438889 
6 1666667 .2013889 .2502503 .3750000 7222222 
1680556 .2014815 .2683333 .4688889 
1792593 .2037500 .2772222 
.2426389 2438880 
6938880 
7 1428571 .1673469 .2040816 2653061 .3877551 .7551020 
1435374 .1673753 .2071051 2865646 .5249433 
1483277 1683749 2084604 .3287528 
1699622 .1796003 .2085147 
.2661083 2704195 
8632766 
1250000 .1482292 .1687500 .2070313 .2708333 .3984375 7812500 
1253720 .1432398 .1700415 2147109 .3040497 5720663 
1277636 .1437518 1703987 .2236926 .3835743 
1372042. 1483352 .1705974 .2373838 
1707559 .1729783 .1796457 
.3011529 .3248838 
10427409 
9 1252205 .1440329 .1703704 .2008765 .2757202 .4074074 .8024691 
1252251 1446759 .1738522 .2224704 3197338 .6114969 
1255149 .1447989 .1769300 .2405089 .4340503 
1277685 .1449725 .1793804 .2826042 
1877238 1485074 1808582 
1786661 .1786757 
3979878 
10 4150000 .8200000 
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TABLE XIII 

79.29 67.49 54.09 38.71 20.89 
96.50 88.75 79.21 67.17 51.52 30.32 
89.87 85.07 78.12 67.14 49.52 
75.69 74.57 72.43 66.67 
51.56 51.12 49.86 
25.29 20.44 
7.03 


-73 77.14 65.22 52.94 40.30 27.27 13.85 
-72 76.80 63.91 49.94 34.75 18.17 
.52 76.73 62.79 46.06 25.60 
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Fia. 1. Efficiency per unit of waiting time for n = 10, 9 and 8 with different values of rs in 
singly censored samples from the right of the one-parameter exponential distribution. 
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TABLES FOR TOLERANCE LIMITS FOR A NORMAL POPULA- 
TION BASED ON SAMPLE MEAN AND RANGE OR 
MEAN RANGE 


Susrr Kumar Mirra 
Indian Statistical Institute, Calcutta and University of North Carolina 


Let # and r be respectively the observed mean and range in a random 
sample of size n from a normal population. Values of k; are tabulated 
which ensure a probability 8 that the pair of limits z+/yr will include 
at least a proportion p of the population. For tolerance limits of the 
form +k, where Z is the grand mean and 7 the mean range in N 
samples of size m each, necessary tables of the factor ke are provided 
for situations where m is either 4 or 5, these being common group sizes 
in control chart analysis. 


1. INTRODUCTION 


NE of the common methods of specifying the quality of a manufactured 

product is to set limits within which a certain percentage of the products 
produced under commercial conditions may be expected to lie. These limits 
known as “tolerance limits” in literature are more appealing to the consumer 
than specifying the mean and range because they are easier to interpret in prob- 
lems involving interchangeability of parts, etc. 

The problem of setting limits that are reasonable from statistical considera- 
tions has received considerable attention in the past. Wald and Wolfowitz 
[8] have shown that if the underlying statistical universe is normal, a pair of 
limits +s with a suitably determined \ can be set such that in a large series 
of samples from the above population a certain proportion 8 of the intervals 
%+4s will include more than a specified proportion p of the universe, where Zz 
is the sample mean and s is the sample standard deviation. The constant 8 
is known as the confidence coefficient, since it measures the confidence with 
which we can assert that the tolerance limits will include more than p of the 
population. Bowker [2] has published an extensive table of the constant i. 

It is a simple matter to extend the Wald-Wolfowitz argument to obtain 
tolerance limits for a normal population of a more general form <+kf where z 
is the sample mean as before, but f is any positive function of the sample ob- 
servations, which issuch that _ 

(a) f is not equal to a constant with probability one, 

(b) f is distributed independently of 2, 

(c) the distribution function of f/¢ does not depend on the unknown mean 

uw or on the unknown variance o* of the underlying normal population, 

(d) the distribution function of {/c has continuous derivatives up to the fourth 

order. 

The sample standard deviation, the sample range, and the mean range in 
two or more samples from the same population are the most common examples 
of statistics satisfying all four conditions. 
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In the present paper, the limits discussed are of the form 
(1) @+kir, where Z is the sample mean and r is the sample range in a sample 
of size n, 

(2) 2+k2? where Z is the grand mean and 7 is the mean range in N samples of 

size m each. 

The use of the range or mean range in place of the standard deviation is sug- 
gested by their growing use in quality control and industrial work which is due 
to their computational simplicity; in fact, in many cases z and # are readily ob- 
tainable from standard control charts on the quality characteristic under study. 

Table 1 gives the value of the constant bs for p=:0.75, 0.90, 0.95, 0.99, 0.999, 
8=0.75, 0.90, 0.95, 0.99 and n=2, 3, 4, ---, 20. 

The tabulation of k, in Tables 2 and 5 ee been restricted to m=4 and 5, 
even though (as we shall see in Sec. 4) m=7 and 8 would lead to better results, 
because m =4 and 5 are commonly used in control chart practice. 

The user of these tables must keep in mind the assumption of normality on 
which their calculation has been based. 


2. CONSTRUCTION OF TABLES 


Let 21, 22, +++, 2, be a random sample of size n from a normal population 
with mean yu and variance o*. Denote by £ the arithmetic mean of the sample 
observations and let f be any positive function of the sample observations satis- 
fying all the four conditions as specified in Sec. 1. 

Making only minor alterations in the Wald-Wolfowitz derivation of toler- 
ance limits we can show that, if c is defined by 


1 1/Vnte 
= p, 2.1 
Van f (2.1) 
and if f(8) is the lower (1—8) =per cent point of the f/c distribution then, 

c 


k* = — 2.2 
16) 


gives quite a good approximation to the true value of the tolerance factor k; 
or, to be more precise, 


Prob {—— > pt 
J n? 


A simplified exposition of the Wald-Wolfowitz derivation is given by Wallis 
[9], who also refers to certain unpublished calculations made by the Statistical 
Research Group, Columbia University, in 1945, which indicated that this 
approximation is fairly satisfactory even for as low a value of n as 2 if f=s. For 
n=2 the sample range is simply a constant multiple of s, and for other small 
sample sizes the sample range or the sample mean range ¢ compares quite favor- 
ably with s as an estimator of the population standard deviation. Hence, it 
seems plausible that this approximation will work out nicely even when we 
take for f the sample range r or the mean range 7. 
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The percentage points of the standardized range distribution were obtained 
by inverse interpolation from the table of the probability integral of the stand- 
ardized range [6]. The percentage points of the standardized mean range dis- 
tribution were obtained by using Patnaik’s cx/+/v approximation [5] to the 
distribution of standardized mean range, where x? has the well known chi- 
square distribution with » degrees of freedom and the Cornish-Fisher type 
approximation [3] to the percentage points of the x? distribution. The values of 
c for n $30 were obtained using Thompson’s table of percentage points of x? [6] 
and Bowker’s table of tolerance factors. For n>30 Bowker’s approximation 
[1] was used for the determination of c. All these approximations lead to the 
following simple formula for ke. 


1 4+ 
—{1--—— 2.4 
)=( 12v 


j- +a{-242 1+ 
NM? 16 NM?) ’ 
M = E(r/o), 
V = Variance (r/c), 
r = the observed range in a sample of size m from N(y, ¢), 


and are given by 


The Biometrika Tables [6] give values of M and V for various values of m. 


3. USE OF THE TABLES 


In the production of pig iron, the per cent of silicon is one of the important 
quality characteristics, and is determined for each cast. The grand mean Z 
and the mean range f of per cent of silicon content in 20 samples of 4 casts 
each at one blast furnace was found to be z = 0.8575, # =0.3710. It is desirable 
to know with a confidence coefficient of 0.95, limits within which the per cent 
of silicon of 95 per cent or more of the future casts will lie. 

Entering Table 2 with p=0.95 and 8=0.95 we find for N =20, k,=1.129. 
The required tolerance interval in per cent is then 0.8575+1.129 (0.3710). 
Hence we may assert with 95 per cent confidence that 95 per cent of the future 
casts will have a per cent of silicon content between 0.4386 and 1.2764. 


1 
where 
FH ~2, 
1 
1 “a 
Pp. 
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4, CONCLUSION 


It is known that the efficiency of range as an estimator of the population 
standard deviation decreases rapidly for increasing sample size. This will mean 
that the ratio of the expected length of tolerance intervals based on range to 
that based on standard deviation will increase rapidly with n the sample size. 
The question of a proper subdivision of a sample which will give the smallest 
variance for the mean range as an estimator for the population standard devia- 
tion has been considered by Pearson [7] and by Grubbs and Weaver [4]. The 
following table seems to be in agreement with the latter’s observation that a 
subgroup of size 7 or 8 might lead to the best possible subdivision of the total 
sample. 


RATIO OF EXPECTED LENGTH OF TOLERANCE INTERVAL BASED 
ON MEAN RANGE TO THAT BASED ON STANDARD 
DEVIATION, FOR VARIOUS POSSIBLE SUB- 

DIVISIONS OF A TOTAL SAMPLE 
SIZE OF 20 


Sample No. of Values of the Confidence Coefficient 8 
size samples . 0.90 0.95 


20 1 < 1.044 1.051 
10 2 ‘ 1.015 1.015 
5 4 é 1.018 1.019 
4 5 o 1.025 1.029 
2 10 ‘ 1.597 1.718 
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STATISTICAL ABSTRACTS 


All communications concerning this section should be addressed to the Ab- 
stracts Editor, Dr. Walter L. Smith, Statistical Laboratory, St. Andrew’s 


Hill, Cambridge, England. 
Bartholomew, D. J., “A sequential test for ran- 


domness of intervals,” Journal of the Royal 


Statistical Society (B), 18 (1956), 95-103. 


Let th, +++, be a sample of intervals. A 
Wald sequential test is given to test whether 
the ¢’s follow an exponential distribution (and 
can thus be considered as occurring randomly at 
a rate “, say) against the alternative of Type III 
distribution. The author first considers the case 
» known, and then by a sequential tranformation 
the case u unknown; he shows that the OC of 
the latter test is practically the same as that of 
the former while the ASN’s are in an approxi- 
mately constant ratio. E. 8. Paas, University of 
Durham, England. 


Barton, D. E., and David, F. N., “Some notes on 
ordered random intervals,” Journal of the 
Royal Statistical Society (B), 18 (1956), 79-94. 


Suppose n—1 points to be placed at random 
on a line of unit length and let g;, - + + , gn be the 
ordered set of intervals obtained. It is shown how 
to derive the joint distribution of any number of 
the g’s, its moments and generating function. 
Limiting univariate distributions of upper and 
lower extreme values, and of the quantiles are 
given, together with some of the corresponding 
results for the multivariate case. The distribu- 
tions of g/g. and g-—g, are salo derived. E. 8. 
Paces, University of Durham, England. 


Binet, R. E., and Watson, G. S., “Algebraic 
theory of the computing routine for tests of 
significance on the dimensionality of normal 
multivariate systems,” Journal of the Royal 
Statistical Society (B), 18 (1956), 70-8. 


Tests of significance to discover the structural 
relationships of a set of k normal p-variate popu- 
lations have been given by Fisher and (essential- 
ly) by Wilks. An algebraic proof is given for the 
identity of the statistics obtained by both meth- 
ods, and it is shown that Fisher’s method requires 
the inversion of a pXp matrix, while Wilks’ 
needs that of a (k—1)X(k—1) matrix. E. 8. 
Pace, University of Durham, England. 


Cham , D. G., “An elementary method 
of solution of the queueing problem with a single 
server and constant parameters,” Journal of the 
Royal Statisticel Society (B), 18 (1956), 125-8. 


The distribution of queue size is derived 
directly for the single server queue with inde- 
pendent exponential input and service times. 
E. S. Pas, University of Durham, England. 


David, F. N., and Johnson, N. L. “Some tests 
of significance with ordered variables,” Journal 
of the Royal Statistical Society (B), 18 (1956), 
1-20. 


This paper describes some quick tests of signif- 
icance applicable to data which have been cen- 
sored in several ways. Suppose first that the larg- 
est observations of a sample are censored. Tests 
for a normal mean population, standard deviation 
known, were given in an earlier paper based on 
sample median, z», and on a set of the least ob- 
servations. A proposed test for the normal popu- 
lation standard deviation having a specified 
value, oo, is based on [=(2,— —%) /oo where 
is the sample interq' ce; 
distribution of I is fitted by x and x? forms, and 
upper percentage points calculated by both fits 
are shown to differ little. To test for a normal 
mean having a specified value, fo, variance un- 
known, the authors use Ry = (%m—£s)/(te— Ze), 
and fit its distribution by a Pearson curve; some 
significance points are tabulated. Tests are also 
given for the equivalence of parameters of loca- 
tion in two populations, when the 8.D.’s are as- 
sumed known, both when the medians are avail- 
able and when they are not, and when the S.D.’s 
are unknown but assumed equal. Among other 
tests quoted are some based on order statistics 
for testing normality, and their asymptotic rela- 
tive efficiencies are calculated. A bibliography of 
the theory of order statistics lists papers appear- 
ing after the bibliography by Wilks (1947). In 
the discussion following the paper H. E. Daniels 
points out that modifications of the Brown and 
Mood median tests can provide other tests for 
the situations envisaged in the paper. E. 8. Pace, 
University of Durham, England. 


De Baun, Robert M., “Block effects in the de- 
termination of optimum conditions,” Biometrics, 
12 (1956), 20-2. 

Second-order “central composite” designs for 
the determination of optimum levels of a set of 
continuously variable factors consist of a fac- 
torial design in full, or suitable partial replica- 
tion in the n factors under study, plus a cross- 
polytope of radius a with 2n points on the exterior 
of the design and q replications of the central 


point. 

The order response surface is Y=By 
+B,Xit +++ +BaX_ where X; 
= +1. The {inclusion of auditions points at the 
surface to the yield data 


Y =Bo+ BX: + > By + 
int ia 
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Adding one or more replications at the center 
, 0 in the preliminary block experiments 
permits estimating the By and also makes the 
location of the maximum more precise by reduc- 
ing correlation among the estimates of the Bj. 
The inclusion of central points in the various 
blocks that might arise enables the experimenter 
to balance block effects, rotatability and uniform 
distribution. Atpert Romano, Virginia Poly- 
technic Institute. 


Deemer, W. L., Jr., and Votaw, D. F., Jr., 
“Estimation of parameters of truncated or cen- 
sored exponential distributions,” Annals of 
Mathematical Statsitics, 26 (1955), 498-504. 
Authors’ summary: “This paper gives maxi- 
mum likelihood estimators of parameters of 
truncated and censored exponential distributions, 
asymptotic variances of the estimators, and 
asymptotic confidence intervals for the parame- 
ters. Applications to bombing accuracy studies 
and to life testing are pointed out. As regards 
bombing accuracy the parameter estimated is 
the reciprocal of the variance in a normal bi- 
variate distribution having circular symmetry. 
The reciprocal is estimated because there is no 
maximum likelihood esitmator of the variance 
and any estimator of the variance is badly biased. 
Results of a synthetic sampling experiment are 
given to provide information on rapidity of con- 
vergence of the distributions of the estimators to 
their asymptotic distributions.” W. J. Haut, 
Communicable Disease Center, USPHS. 


Finney, D. J., “Multivariate analysis and agri- 
cultural experiments,” Biometrice, 12 (1956), 
67-71. 

The use of multivariate analysis of variance 

and the construction of canonical variates in the 
analysis and interpretation of agricultural and 
other experiments is severely criticized. The 
general test of variety differences provided by 
the canonical multivariate approach is of little 
practical value when large numbers of varieties 
are under consideration. The emphasis in such 
experiments should rather be placed on problems 
of estimation and these problems should be for- 
mulated in terms of measures that the investigator 
considers relevant to the judgments he must 
make. Without such interpretation in terms of 
estimation the application of certain multivariate 
techniques is inappropriate and often actively 
misleading. A. E. Garratt, Virginia Polytech- 
nic Institute. 
Fisher, Sir Ronald, “On a test of ce in 
Pearson’s Biometrika Tables (No. 11),” Journal 
of the Royal Statistical Society (B), 18 (1956), 
56-60. 

The author shows that use of tables for Welch’s 
test for the Behrens-Fisher problem leads to the 
recording of significance considerably more fre- 
quently than the level of significance in the table. 
Fisher writes “I have no doubt that the error of 
these calculations consists in ignoring the fact 
that the table is entered with s;/8: (the ratio of 
sample standard errors), or some equivalent 
function, and therefore that each tabular entry 
can come into use only in that selection of cases 
in which this ratio is realized.” E. 8. Pacs, 
University of Durham, England. 
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Fix, Evelyn, and Hodges, J. L., Jr., “Significance 
probabilities of the Wilcoxon. test,” Annals of 
Mathematical Statistics, 26 (1955), 301-12. 


This paper is concerned with the Wilcoxon 
unpaired two-sample rank test. By developing 
a combinatorial identity, the authors have pre- 
pared tables from which exact values of the dis- 
tribution of the Wilcoxon statistic may be ob- 
tained when the smaller sample size m does not 
exceed 12. A discussion of other available tables 
is given. Also, simple formulas are developed for 
the coefficients of the Edgeworth series for this 
distribution to terms of order 1/m*. A numerical 
investigation indicates considerable improve- 
ment over the normal approximation and relia- 
bility to about 4D when m=12. W. J. Hatt, 
Communicable Disease Center, USPHS. 


Girshick, M. A., Rubin, H., and Sitgreaves, R. . 
“Estimates of bounded relative error in particle 
counting,” Annals of Mathematical Statistics, 
26 (1955), 276-85. 


This paper considers methods of estimating 
the parameter \ of a Poisson distribution, say 
the average number of particles (or events) per 
unit area (or time), with an estimate which, with 
confidence coefficient a, is in error by no more 
than 1007 per cent of \; a and y being specified 
in advance. Such an estimate of “bounded relative 
error” is derived, based on a sampling procedure 
in which the area is continuously expanded until 
a fixed number M of particles have been counted. 
Methods are given for determining the mini- 
mum M, given a and y. 

Three variations on this sampling procedure 
are also considered, each affording estimates of 
bounded relative error: (1) counts are made on 
each of k subsamples until specified numbers in 
each subsample have been obtained; (2) areas 
of subsamples of subsequent counts are deter- 
mined by the area required for an initial count of 
r particles; the subsequent counts are made se- 
quentially until a total of M particles have been 
counted; (3) similar to the preceding method with 
the number of subsequent counts predetermined. 
Finally, an estimate of bounded relative error 
for the variance of a normal distribution is given. 
W. J. Hatt, Communicable i Center, 
USPHS. 


Grant, D. A., “Analysis-of-variance tests in the 
analysis and comparison of curves,” Psychologi- 
cal Bulletin, 53 (1956), 141-54. 


Extending Alexander’s technique for the analy-~ 
sis via orthogonal polynomials of trends based 
upon repeated measurements of the same indi- 
viduals “by further analysis of the orthogonal 
components of the trend and by providing for 
the separation of orthogonal components of 
differences between groups,” Grant presents in 
detail an illustrative experiment involving a 
high and a low anxiety group subdivided into 
three “shock” subgroups each. The dependent 
variable consisted of perseverative error scores 
on five consecutive trials for the four individuals 
in each subgroup. Junttan C. Staniey, Univer- 
sity of Wisconsin. 
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Grundy, R. M., Healy, M. J. R., and Rees, D. H., 
“Economic choice of the amount of experimenta- 
tion,” Journal of the Royal Statistical Society 
(B), 18 (1956), 32-49. 


Suppose that an initial experiment of m ob- 
servations has been performed in order to try to 
decide whether it is economically advantageous 
to replace an old process by a new one. It is 
desired to minimize the cost of experimentation 
plus the expected loss due to wrong decisions— 
the latter depending on the unknown benefit, 
@, of the new process which can be estimated 
from the initial experiment. The authors con- 
sider the case in which just one further experi- 
ment (size nz say) can be performed, and assume 
that estimates of 6, 2, x2, from the two experi- 
ments are normally distributed with known vari- 
ances o?/m, o2/r2. The problem posed by the 
ignorance of 6 is avoided by introducing an inte- 
grated risk, averaging with respect to the fiducial 
distribution of # given the initial sample—Box 
points out that this device is equivalent to as- 
suming a uniform prior distribution—and mini- 
mizing the integrated risk with respect to m,. The 
general consequences of the decision rule are 
shown to be reasonable, and a nomogram to 
determine the amount of second stage experimen- 
tation is given. This procedure is compared with 
a similar two-stage one designed to maximize the 
integrated gain per unit outlay, and with a mini- 
max single stage procedure, and it is shown to 
have some advantages over both alternatives. 
E. 8. Paar, University of Durham, England. 


Hewlett, P. S., and Plackett, R. L., “The relation 


between quantal and graded responses to drugs,” 
Biometrics, 12 (1956), 72-8. 


An hypothesis is discussed to discover any 
connection between the dose-response relation- 
ship for quantal responses on the one hand and 
those for graded responses on the other. This 
hypothesis is that an individual organism re- 
sponds quantally if an underlying quantitative 
change that results from administration of the 
drug, and that can be regarded as a graded re- 
sponse, reaches a certain level of intensity char- 
acteristic of that individual organism. Experi- 
mental data suituble for testing the predictions 
from the theory were not found in the literature, 
but such as are relevant lend slight support. To 
obtain suitable data, special experiments would 
probably need to be carried out. Crype Y. 
Kramer, Virginia Polytechnic Institute. 


Jackson, R. R. P., “Random queueing processes 


with phase-type service,” Journal of the Royal 
Statistical Society (B), 18 (1956), 129-32. 


“Customers arrive at random and are served 
in order of arrival at each of a number of counters 
arranged in series. Each counter of the series is 
permitted to have many servers, and the dist. ibu- 
tion of service time for a customer at each phase 
of service (e.g. at each counter) is supposed to 
be negative exponential. The equilibrium solu- 
tion for the general case is obtained, but more 
detail is given for the single server system.” 
[Author’s summary.] E. 8. Paan, University of 
Durham, England. 
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Johnson, O. G., and Stanley, J. C., “Attitudes 

toward authority of delinquent and nondelin- 

quent boys,” Journal of Abnormal and Social 
Psychology, 51 (1955), 712-6. 


Three bipolar attitude-toward-authority vari- 
ables generate eight projective-test picture cards. 
Two comparable sets of these (16 cards) were 
administered to 20 delinquent and 20 nondelin- 
quent boys, and the response to each card wes 
evaluated on a 0 through 4 submissiveness- 
hostility scale to yield 640 scores. For the split- 
plot analysis of variance, the within-plot replica- 
tion provides an error term with which to test 
the significance of each of the seven interactions 
involving individuals, usually not possible. This 
procedure for devising and analyzing “struc- 
tured” tests should have wide applicability. 
C. Staniey, University of Wisconsin. 


Kimura, Motoo, “Random genetic drift in a tri- 
allelic locus; exact solution with a continuous 
model,” Biometrics, 12 (1956), 57-66. 


Consider a randomly mating population of 
size N. Denoting the three alleles by Ai, Az, and 
As, we then define $(:, y| p, q; t) to be the density 
of the conditional probability that the frequency 
of A, lies between z and z+dz and that of A: lies 
between y and y+dy in the tth generation given 
that they started from z=p and y=q at t=0. 
Previous asymptotic results for ¢(z, y|p, @; t) 
are summarized and then the exact solution is 
obtained by the use of partial differential equa- 
tions. The final result shows that the distribu- 
tion surface decreases in height at the rate of 
3/(2N) per generation. This method can be ex- 
tended to a larger number of alleles but the 
mathematics becomes considerably more compli- 
cated. R. E. Waupo.e, Virginia Polytechnic In- 
stitute. 


Lukacs, Eugene, “A characterization of the gam- 
ma distribution,” Annals of Mathematical 
Statistics, 26 (1955), 319-24. 


The author proves the following theorem: 
Let X and Y be two nondegenerate, positive, 
independently distributed random variables. 
Then X+Y and X/Y are independently distrib- 
uted if and only if both X and Y have gamma 
distributions with the same scale parameter. 
W. J. Hatt, Communicable Disease Cenier, 
USPHS. 


Moran, P. A. P., “A test of significance for an 
unidentifiable relation,” Journal of the Royal 
Statistical Society (B), 18 \1956), 61-4. 


A linear relation between two variates both 
of which are measured with error, supposed 
normally distributed cannot be identified but 
it must lie in the acute angle between the two 
regression lines of the bivariate normal surface. 
A test with an upper bound to the significance 
levels is given for the hypothesis that the under- 
lying structural relation passes through a given 
point; the test consists of two one-sided t-tests 
on the intersections with the axes of the sample 
regression lines. E. 8. Pacgr, University of Dur- 
ham, England. 
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Morrison, Milton, “Fractional replications f 
mixed series,” Biometrics, 12 (1956), 1-34. 
New designs called “estimation designs” 
obtained by expressing the total number er 
possible data points in the form z,(2") +2,-,(2*7") 
+--+ -+2(2)+29. As an example, given five 
independent variables, four at two levels and 
one at three levels, this may be expressed as 


behavicr of an individual and hence accessible 
without the usual statistical aids but also ‘objec- 
tive’ and ‘actual’ without recourse to deductive 
theorizing. . . . Statistical techniques serve a 
useful function, but they have acquired a purely 
honorific status which may be troublesome. . . 

It is time to insist that science does not progress 
by carefully designed steps called ‘experiments’ 


2® (ayaa) (dubz) (did) (€x€2) + 2*(ayG2) (bib2) (did) (es) 


where 25 = z= 1 and 23= 2: = 2, = z9= 0. The meth- 
od of fractional replications is used to complete 
the design. One must use methods of estimation 
in forming the design; however, least squares 
estimates can be made from the completed de- 
sign. A method of estimating “missing observa- 
tions” is indicated and consideration given to 
the precision of these estimates. A numerical 
example is included. Leroy 8. Brenna, Virginia 
Polytechnic Institute. 


Norton, H. W., “One likelihooa a .stment may 
be adequate,” Biometrics, 12 (1956), 79-81. 


Fisher states that a single adjustment on an 
inefficient maximum likelihood estimate suffices. 
Norton shows that it does not always suffice, 
and that the more inefficient the estimate, the 
more successive adjustments are ncessary. This 
applies a fortiort to simultaneous estimation of 
two or more quantities, where often the iterative 
process should be repeated until adjustments are 
small and the covariance matrix is stable. R. H. 
Virginia Polytechnic Institute. 


Ramachandran, K. V., “Contributions to simul- 
taneous confidence interval estimation,” Bio- 
metrics, 12 (1956), 51-6. 


Simultaneous confidence interval estimation is 
applied to two situations. The first is a one-way 
classification with n+1 observations per classi- 
fication, the classifications having heterogeneous 
error variances (i= 1, 2,-- , k). Simultaneous 
confidence intervals are found for all ratios of 
the variances, i.e., f=1, 
2,-+-+, k). Further, an associated test for the 
homogeneity of variances is given. The second 
situation is a t” factorial experiment in which all 
p main effects are unconfounded. Here a pro- 
cedure for finding simultaneous confidence inter- 
vals for all the main effects is developed. Exam- 
ples illustrating both procedures are given. JoHN 
J. Gart, Virginia Polytechnic ‘te. 


Skinner, B. F., “A case history in scientific 
method,” American Psychologist, 11 (1956), 
221-33. 


Tracing the development of his learning ex- 
perimentation and showing how his methods 
depart from the statistician’s design of experi- 
ments paradigm, Harvard psychologist Skinner 
emphasizes his belief that achieving practical 
control over the performance of each organism 
obviates the need for statistical techniques and 
theories of behavior. “We are within reach of a 
science of the individual. . . . In the experimental 
analysis of behavior we address ourselves to a 
subject matter which is not only manifestly the 


each of which has a well-defined beginning and 
end. Science is a continuous and often a dis- 
orderly and accidental process.... What the 
statistician means by the design of experiments 
is design which yields the kind of data to which 
his techniques are applicable. He does not mean 
the behavior of the scientist in his laboratory 
devising research for his own immediate and 
possibly inscrutable purposes.” Jutian C. STan- 
LEY, University of Wisconsin. 


Teichroew, D., “Empirical power functions for 
nonparametric two-sample tests for small sam- 
ples,” Annals of Mathematical Statistics, 24 
(1955), 340-4. 


The author is concerned with rank order tests 
for testing if two samples come from the same 
population. The test under study is obtained by 
considering the various possible rankings of the 
observations from the two samples and choosing 
as the critical region those rankings with the 
greatest power against the simple alternative 
considered. In particular, he considers the prob- 
lem of testing the hypothesis that two samples 
come from the same normal population against 
the alternative that they come from normal 
populations with the same variance ¢? but with 
means differing by 5c. The optimum rank test 
for small 3 is Hoeffding’s ¢ test. By examining 
empirical frequencies of the various rankings for 
various 4’s and small sample sizes, which are 
tabled in the paper, the author suggests that it 
may be possible to construct uniformly most 
powerful rank order tests for these hypotheses. 
W. J. Hatt, Communicable Disease Center, 
USPHS. 


Watson, G. S., “Missing and ‘mixed-up’ fre- 
quencies in contingency tables,” Biometrics, 12 
(1956), 47-50. 

The procedure for dealing with the problem of 
missing cell data in rXc tables analyzed by 
analysis of variance techniques is well known. 
The problem considered here is similar except 
that it arises where frequency or “count data” 
are missing or mixed-up in the cells of an r X c con- 
tingency table to be analyzed by chi-square. 

It is shown that if the frequency fj; is missing 
ae cell (4, 7) in an rXc contingency table 

here existing row and column ote are given 
by Gar ++ r) and C; (j=1-- - ¢) respective- 
ly, and N is the total then z=R,C;/ 
(1—R;—C;) may be inserted for the missing cell 
frequency. After adding z to all affected totals, 
chi-square computed in the usual way will re- 
ceive no contribution from cell wh j) and will have 
reedom. If more than 
one cell is missing, the above formula for z may 
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be used iteratively to give estimates for all miss- 
ing values. The degrees of freedom for chi-square 
being (r—1)(c—1) less the number of missing 
cell frequencies. The problem of two cell fre- 
quencies, say fur and fiz, being mixed up so that it 
is uncertain whether fi, belongs in cell (1, 1) or 
cell (1, 2) is also treated. Wm.tarp O. Asa, 
Virginia Polytechnic Institute. 


Weiss, Lionel, “On confidence intervals of given 
length for the mean of a normal distribution 
with unknown variance,” Annals of Mathemati- 
cal Statistics, 24 (1955), 348-62. 


The author’s summary states: “The problem 
of finding a confidence interval of preassigned 
length and of more than a given ccnfidence co- 
efficient for the unknown mean of a rormal dis- 
tribution with unknown variance is insoluble if 
the sample size used is fixed before sampling 
starts. In this paper two-sample plans, with the 
size of the second sample upon the 
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observations in the first sample [as in Stein’s two- 
sample test], are discussed. Consideration is 
limited to those schemes which increase the 
center of the final confidence interval by k if 
each observation is increased by k, and for which 
the size of the second sample is a function only 
of the differences among the observations in the 
first sample. Then it is shown that the mean of 
all the observations taken should be used as the 
center of the final confidence interval. Those 
schemes which make the size of the second sample 
a nondecreasing function of the sample variance 
of the first sample are shown to heve certain 
desirable properties with respect to the distri- 
bution of the number of observations required to 
come to a decision.” In particular, given any 
sample-size rule R not of this latter type, there 
is a corresponding rule R’ of this type such that, 
at least for “large” values of the sample size, the 
c.4.f. of the sample size will not be decreased by 
using R’ instead of R. W. J. Hatt, Communica- 
ble Disease Center, USPHS. 
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Statistical Methods: Applied to Experiments in Agriculture and Biology, Fifth Editien. 
George W. Snedecor. With Chapter 17 on Sampling by William G. Cochran. Ames, Iowa: The 
Towa State College Press, 1956. Pp. xiii, 534. $7.50. 


K. A. Brownuezz, University of Chicago 


HE appearance of a Fifth Edition of Snedecor’s Statistical Methods draws one’s 

attention rather forcibly to the fact that ten years have elapsed since the publica- 
tion of the Fourth Edition, which was reviewed in Vol. 41 of this journal by D. J. 
Finney (2$ pages) and W. J. Youden (2 pages), and nineteen years since the First 
Edition, published in 1937. Throughout this period Statistical Methods has been be- 
loved by users of statistics in biology, and also other sciences, for talking to them in 
language that they can understand, without committing conceptual errors that bring 
down the wrath and scorn of the theoretical statistician. 

The present edition states explicitly that it is written for two groups of readers: 


(a) beginners in biology, 
(b) research workers in biology. 


To achieve this objective, the text contains signposts in each chapter at which the 
former type of reader can give up and proceed to the next chapter. The effect is to 
produce a rather wide range of difficulty, and for the reader capable of understanding 
the distinction between the various mixed models in a split plot situation (p. 369) an 
exposition of the Student t-test for comparison of means that first takes two pages 
for the case of equal sample sizes (pp. 87-8) and then takes another two pages for the 
case of unequal sample sizes (pp. 90-1) must appear maddeningly pedestrian. One is 
tempted to raise the question of whether a student for whom this type of approach is 
necessary should be studying statistics, or studying biology, or studying. 

The general content, and the general approach, of Statistical Methods must be so 
well known to all statisticians that it must be entirely redundant either to summarize 
or to comment upon them. Those statisticians who find this approach satisfactory for 
the classes they have to teach, or the clients they have to advise, are grateful for the 
existence of this text, since it appears to remain much the best of its type despite 
many attempted imitations. Those statisticians who have better prepared students, 
or more sophisticated clients, are grateful, or should be, for this fact. 

The publicity announcing this Fifth Edition does not, I think, give an adequate 
impression of the extent to which this edition has been revised and enlarged. Physi- 
cally, the volume contains 49 more pages. There are now sections dealing with the 
following topics: multiple comparisons (following Tukey’s unpublished procedure) ; 
Lord’s studentized range; the sign test; Wilcoxon tests, for unpaired observations and 
for paired observations; regression through the origin with standard deviation of the 
dependent variable proportional to the independent variable; correction for bias of 
treatment mean squares in substitution of missing values; Tukey’s test for additivity; 
Satterthwaite’s treatment of degrees of freedom of linear combinations of mean 
squares. Further, there has been an extensive rewriting and rearrangement of parts 
of the original text. The last chapter in the book, on DesiGN AND ANALYsiIs OF SAM- 
pLINGs, which took 24 pages in the Fourth Edition, is now the responsibility of 
William G. Cochran, who uses 35 pages. 

In the new edition, the chapter on two-way analysis of variance is followed by a 
chapter on ComMpaRIsONS: FacrorRIAL ARRANGEMENTS OF TREATMENTS. The former 
part is derived from the chapter on InpIvipvaL DeGrees or Freepom which in the 
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old edition came after multiple and curvilinear regression. The latter part, on factorial 
arrangements, is substantially new, and contains a much more modern approach to 
expectations of mean squares in analysis of variance, reflecting the progress of the 
past decade. The revisions and additions here are the most valuable aspect of the new 
edition. 

Modifications to the tables of statistical functions include a more extensive chi- 
square, the addition of 25, 10, 24, and 4 percentage points of F, and tables of the 
Studentized range, orthogonal polyuomial coefficients up to n=7, sums of ranks for 
the two Wilcoxon tests, and Lord’s functions for the Studentized range analog of the 
t-test (this latter table, incidentally, is not listed in the Index of Tables). 

A comment on omissions is not to be construed as a criticism, for Snedecor’s choice 
of topics reflects his experience as to what is most desirable for his intended reader and 
this experience must be accepted. Further, the book already contains so much it is 
ungracious to ask for more. It is surprising, nevertheless, to observe that incomplete 
block designs, for example, balanced incomplete blocks and Youden squares and con- 
founding arrangements for factorial experiments, are omitted. A greater emphasis 
upon the actual physical operation of randomization would probably be desirable in 
emphasizing its absolute necessity. The fact that the arcsin transformation has a 
theoretical variance might have been mentioned. In view of the extensive use, for 
example in Chap. 10, of variance components, some mention might have been made 
of their tremendous variance, and still better, some method of estimating this, e.g., 
that of Bross, might have been outlined. The above topics would seem more interest- 
ing to a biologist than a test for kurtosis, for example. As regards the Index, it appears 
to be satisfactory in general, though I could not find any mention of the determination 
of confidence limits for a variance, which actually does appear in the text. 

It seems incorrect to attribute (p. 117) the unequal sample size unpaired Wilcoxon 
test to White (Biometrics, 1952), as this topic had certainly been covered by Festinger 
(Psychometrika, 1946) and Mann and Whitney (Annals of Mathematical Statistics, 
1947) several years earlier. 

In Tables 9.9.1 and .2, the observed sample fractions are labelled “Probability of 
Survival” which would lead the reader to confuse sample values with population 
parameters. 

Perhaps influenced by his personal experience in which “many biological variables 
whose distributions are approximately normal, such as heights of men, for example, 
or lengths of ears of corn, or dressing percentages of swine” may have occurred fre- 
quently, Snedecor would be judged by some to be somewhat ungenerous to non- 
parametric tests. One chapter is headed SHort Cuts aND APPROXIMATIONS: LEss 
Tuan Erricient anp Non-ParamMetric Meruops” which, whatever the in- 
tention, certainly will have the tendency to smear non-parametric tests with guilt by 
association with inefficient and approximate methods. That we are not reading too 
much into this title is substantiated by such sentences as “If your data aren’t worth 
much, the short-cut method may be good enough.” 

The point, of course, is that non-parametric tests are designed to function under 
weaker assumptions. It is purely incidental whether a particular non-parametric test 
is shorter to apply than its parametric analog: some of them are, but others are not. 
For example, Lord’s studentized range analog of the t-test is parametric and very 
quick; the permutation analog of the t-test is non-parametric and exceedingly lengthy. 

I do not find the following section particularly satisfactory: “. . . , the efficiency of 
t-test decreases with anormality of the sampled population. If the population is known 
to be far from normal, there would perhaps be more information in the rank test and 
at less cost” (i.e., with easier calculations). 
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“Efficiency” of a test is not defined, not surprisingly, perhaps, since there are many 
subtleties to be considered in providing a satisfactory definition. If the word is being 
used loosely for “power,” I doubt the general truth of the statement. For example, 
Ghurye (Biometrika, 36, 426-30, 1949) concludes that for a one-sided t-test applied to 
a certain positively skewed population, the real a will be smaller than the nominal 
and the outer part of the power curve lies above the corresponding power curve for 
the normal distribution. In the sevond sentence quoted above, presumably the word 
“information” is not being used in any of its several technical senses but I cannot 
guess at what is intended. The point which should be made, of course, is that for the 
t-test in these circumstances we, in general, have little idea as to the reai value of a 
compared with the nominal, whereas the rank test will have its a correct. 

The weakest point in this text continues to be its treatment of the concept of 
power of a test. The word “power” does not appear in the Index. “Error, kind, second, 
Type II” receives one reference, which turns out to be the standard four-line defini- 
tion, followed by references to three other texts. “Size of experiment or sample” is 
more generously treated, but to the reviewer those sections are unnecessarily vague 
and simultaneously oversimplified and over-complex. If one wants to formulate a 
recipe, it can be done better than it is here. 

The volume contains points of worldly wisdom of great value: two samples— 

“A stupendous amount of time has been wasted in ill-advised curve fitting. Only 
when the end in view is clear should the task be undertaken.” 

“Finally, if the experiment is of any great value, it leads to new hypotheses which, 
before conclusions are reached, must be tested either by new experiments or by their 
agreement with the known structure of science. It is the investigator’s responsibility 
to integrate all this evidence and to make a decision. He cannot evade this responsi- 
bility by citing a value of chi-square. The probability that he will reach a false con- 
clusion is presumably much less than his errors of the first kind.” 


It is clear that Statistical Methods, which was already the best text of its kind, has 
undergone substantial improvement, and it can therefore be highly recommended to 
its potential readers. 


Facts from Figures. Third Edition. M. J. Moroney. Baltimore: Penguin Books, Inc., 1956. 
Pp. 472. $0.95. 
aca this is the third edition and fourth printing of this popular little volume, 

few changes have been made since it was reviewed in the Journal of the American 
Statistical Association (September 1953). The latest change is in Chap. 16 on “Cor- 
relation, Cause and Effect,” where Moroney introduces Hotelling’s T? test in his dis- 
cussion of discriminatory analyses, but uses for illustration the problems of the orig- 
inal volume. This edition uses better quality paper, making a smaller book out of 
the same number of pages, which yet sells at a lower price. 

D. D. F. 


Applied General Statistics, Second Edition. Frederick EZ. Crozton and Dudley J. Cowden. 
New York: Prentice-Hall, 1955. Pp. xvi, 843. $9.00. 


E. F. Beacu, McGill University 


To book was first published in 1939 and, together with a briefer version, entitled 
Practical Business Statistics, has attained a very wide acceptance as an introductory 
textbook. 

The title is not quite correct inasmuch as there is a fairly heavy weighting with 
business and economic data, over two hundred pages devoted to time series and index 
numbers, plus another chapter given over to the correlation of time series, and some 
earlier chapters largely devoted to time series charts. Much of the illustrative mate- 
rial throughout is economic. 
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This is not to detract from the usefulness of the book, but by way of showing for 
whom it is most useful. The first edition has become one of the most widely used refer- 
ence books on the shelves of business libraries, and quite properly, for it has served 
its purpose well. 

The Preface states that this edition has been “largely rewritten.” The framework is, 
however, essentially the same. The chapter headings are much the same, with some 
re-wordings, and purpose and essential content the same. The chapters on sampling 
and reliability have been moved to the end of the book. Each chapter that uses sym- 
bols is preceded by a “symbol vocabulary” for that chapter. 

In this second edition, the Lorenz and Pareto curves are omitted from the chapter 
on the Frequency Distribution, and the Moving Average no longer appears as a 
method of describing secular trend. Some re-arrangement appears in some of the 
other chapters. Of course, the data used are more recent. 

One item may be discussed in more detail. Four chapters have been devoted tc 
correlation, and the word “regression” is not mentioned. The phrase “line of estima- 
tion” is introduced. This is interesting. First of all, in business and economic circles, 
it is the regression line that is the important item, with correlation coefficients used 
as measures of goodness of fit. There has been a very definite trend in this direction 
for some time. The Croxton and Cowden presentation stands out in contrast to this 
trend in interest. On the other hand, the elimination of that dreadful word “regres- 
sion” is a welcome change. Countless students have been troubled by this word, and 
their understanding of relatively simple processes interfered with. The expression 
“line of estimation” is an improvement, but it is not wholly satisfactory. Why not 
call it a “line of relationship,” which may sometimes be used as a “line of estimation”? 
There is a certain redundancy in the expression “line of relation” and some people 
would prefer the simple term “relation.” 

Thus the titles of these four chapters of Croxton and Cowden could retain the 
word “correlation” but they should be rewritten to emphasize the line of relation as 
the important element in the situation. Then the inference that four chapters are 
necessary to lead up to the measurement of correlation among series with various 
leads and lags will give way to the much more acceptable and understandable im- 
pression that business men want their economists and statisticians to help them in 
guessing what is ahead. In this very important work, time trends and multiple corre- 
lation are all part of the general question of finding relations among series that are 
to some extent projectable into the future. The finding of “lead” series and “lag” 
series is only a part of this—and in terms of statistical work done in business, a small 
part. 

There are several places where the language is misleading or inadequate for begin- 
ning students. One of the more serious is to be found at the top of p. 279. This sen- 
tence, “The trend should be fitted to a period running from B to B',” is followed a 
couple of lines below by, “The first and last years should not both be low points of 
particularly deep depressions. . . .” A glance at Chart 12.6 would show why students 
would be puzzled. On p. 270, the word “must” in the last line on the page is not cor- 
rect. On p. 633, there is a serious mistake in Chart 24.5. Apparently “c =60 Grams” 
should read “o =6 Grams.” 

Students have experienced difficulty with the following sections: (1) the standard 
deviation, (2) the calculation of numbers included on p. 220, (3) the adjusting of trend 
lines in Chapter 12, (4) the concepts of “explained” and “unexplained” variation in 
Chapter 19, and (5) the chapters on statistical inference. 

This last item is particularly important. Chapter 23 is a useful introduction, but 
in the following chapters formulas seem to appear suddenly from nowhere in particu- 
lar, with little introduction. Even those which are proved in an appendix should be 
given some verbal introduction and explanation in the text. 
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Statistics for Economics and Business, Second Edition. Donald W. Paden and 
E. F. Lindquist. New York: McGraw-Hill Book Company, Inc., 1956. Pp. vii, 305. 
$4.75. 


Joun I. Grirrin, College of the City of New York 


— is the second edition of a text designed for a one-semester undergraduate 
course meeting three or more hours per week. The authors have made a real at- 
tempt to reduce the subject matter to the irreducible minimum for students in eco- 
nomics and business. The mathematical preparation assumed is elementary arith- 
metic and ninth-year algebra. Furtiermore, “No attempt [is] made to develop in the 
student any degree of skill or facility in the computation of statistical measures.”’ The 
student using this text is expected to devote the maximum amount of his time to 
“the consideration of the interpretative aspects of the course.” The content of the 
course is “presented in a form which is specially designed to encourage the student to 
think things out for himself and thus arrive at a reasoned understanding of statistical 
techniques.” These are very excellent objectives for any textbook writer. 

The authors, a professor of economics and a professor of education (Lindquist is 
the author of two other texts in statistics), state that the combination of the text and 
the Student Workbook constitute a “method of teaching” which is distinguished by the 
fact that “the student is encouraged to take an active part in the learning process and 
much of what has formerly been presented to him is (by means of the workbook) 
drawn out of him through leading questions and suggestive illustrations.” It is con- 
templated by the authors that the instructor in the course dispense almost entirely 
with formal lectures and that most of the time should be spent in supervised work 
on exercises drawn from the workbook. The workbook consists of 50 lessons, which 
call for a high degree of interpretation and a minimum of computation, aad a large 
number of multiple choice questions which may be used for review and testing pur- 
poses. The Student Workbook, consisting of 156 tear-out pages, costs $2.25. 

It would appear that the suggestion that the integration of text and workbook is a 
distinguishing feature of this work is an exaggeration. The workbook, although excel- 
lent, is quite traditional. Most courses in elementary statistics make every effort to 
apply principles and to “draw out” the student through the use of exercises and 
questions for discussion. In this respect the text and workbook considered in this 
review do not appear to be any more effective than a number of other such combina- 
tions already on the market. In fact, given a student body with such limited back- 
ground as the authors presuppose, it is probable that the instructor will find himself 
supplementing the text and lecturing to a greater extent than the authors would 
contemplate. 

The text itself is divided into 17 chapters. Four of these are devoted to the fre- 
quency distribution, its graphic representation, measures of central tendency and of 
variability. One chapter is concerned with the normal curve and three chapters with 
“sampling-error theory.” Time series analysis is allocated three chapters and correla- 
tion theory three chapters. Index numbers are introduced in the third chapter in the 
book as illustrative of weighting and use of percentages. The discussion of tabular 
and graphic presentation, which is so important for the student in economics and 
business, is scattered throughout the text. Thus, semi-logarithmic charts are pre- 
sented for the first time in Chapter 12; bar charts are encountered by the student on 
p. 198 and illustrations of their use are limited to time series data. Tabular presenta- 
tion is discussed in Chapter 2. 

In general, the text offers very explicit instructions for computing the various 
measures, in fact at points it would appear to be a self-teaching manual rather than a 
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text designed for usual classroom use. The instructions, which take the form of a list 
of steps to be followed, tend to smack of cookbook methods and to render less ef- 
fective the stated objective of the text to stress “the interpretation of statistical tech- 
niques as they are applied to economics and business.’’ The brevity of the text, un- 
fortunately, leaves very little opportunity to discuss the techniques in a setting 
which is relevant to the economics and business student. This is, of course, the price 
that has to be paid if a text is to be held to small dimensions and yet say something 
about all the usual statistical methods. 

At various points in the text unnecessarily complex statements are used. Thus, on 
p. 111 the normal curve of distribution is defined with the use of eleven lines of text 
and the usual formula, which might well leave the ill-prepared student dumbfounded 
or, at least, discouraged. Immediately after this definition, however, the authors 
write, “This definition, of course, will not be very meaningful to any student in this 
course who has not had advanced training in mathematics, nor is he advised to at- 
tempt to derive much meaning from it. It is presented here primarily in order to 
emphasize ... that the normal curve is essentially a mathematical ideal.” Would it 
not have been better to attempt an explanation without this shock treatment, es- 
pecially in view of the effective discussion in the following pages of the text? 

The student who conscientiously works his way through this text and workbook 
will encounter and learn to use much of the elementary machinery of statistics. It is 
to be doubted, however, whether the subject will come alive, especially for the stu- 
dent of economics and business. The time series discussion, for example, does not 
closely relate the techniques to business problems. A few additional pages, saved 
from the discussion of two methods rather than one for seasonal analysis, would have 
been helpful. Particularly in this section of the text the procedures are unduly em- 
phasized and their significance is not made clear. Some discussion of time series data, 
how they are collected, the problem of comparability over time, use of data based on 
sample survey procedures and such “real” problems, would assist in interpreta- 
tion. 

Despite this reviewer’s reservations in respect to the limits set upon themselves by 
the authors, this text is suitable for a minimum course in applied statistical methods. 
It is not, however, a distinctive approach and a serious question arises as to whether 
the minimum is really enough. An instructor who prefers to leave his students to 
work out the common techniques largely by themselves and who adds background 
by means of class lectures may find that this text and workbook are really quite 
satisfactory for his purposes. 


Workbook in Business and Economic Statistics. William A. Spurr. Homewood, Illinois: 
Richard D. Irwin, Inc., 1956. Pp. viii, 272. College price—$3.50. 


Gortrriep E. Noreruer, Boston University 


s workbook has been planned primarily to accompany Spurr, Kellogg, and 
Smith, Business and Economic Statistics (reviewed in the preceding issue of this 
journal), and follows exactly the organization and terminology of that text. The 
main emphasis is on descriptive methods as applied to business and economic data. 
There are 150 problems which are further subdivided into some 700 parts. In addition 
to discussion and computational problems, there are true-and-false, completion, and 
matching problems. Space for calculations and answers, as well as 50 sheets of graph 
paper, are provided. All sheets are perforated. A short table of random numbers and 
a glossary of symbols complete the workbook. 
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Binomial, Normal and Poisson Probabilities. Ed Sinclair Smith. Bel Air, Md.: Ed S. 
Smith, 1953. (Distributed by Author, Box 279, RD 2, Bel Air, Md.) Pp. 71 plus “addenda” 
sheet dated Dec. 1954. $2.50. Paper. 


Morton 8S. Rarr, Bureau of Labor Statistics 


1s useful hendbook provides tables and charts for determining easily any bi- 

nomial probability to within .001 of its true value, without the need for any other 

reference material. There are a large number of original graphs, with extensive dis- 
cussion of the underlying ideas. 

The principal emphasis is on the cumulative binomial probabilities, since these 
pose the greatest obstacles to exact calculation.' The author’s basic strategy is set 
forth on a “map” of np-space, which is divided into six regions with a different pro- 
cedure recommended for each. The region n $20, .01 sp 3.5 is served by a table of 
exact values, while other indicated regions are served by (2) the cumulative Poisson 
distribution, (3) a Gram-Charlier refinement of same, (4) an area under the normal 
curve, (5) a Gram-Charlier refinement of same, and (6) a further refinement of the 
latter using a remainder term. The regions have been chosen so as to assure that the 
error resulting from the use of the appropriate procedure will be less than .001 for any 
value of c for which the true cumulative probability is between .001 and .999. Two 
charts show the upper and lower limits of ¢ as a function of n and p. 

This book will be especially welcome to those who need to go beyond the limits of 
existing tables and to occasional users of binomial probabilities who may find the 
large books of tables too bulky or too expensive. While the primary use of the book is 
for computing numerical values, it also furnishes valuable insight into the behavior 
of the various approximating functions. 

In the realm of criticism the following points deserve mention. The Table of Con- 
tents does not include a detailed list of the numerous tables and figures. Some of the 
graphs are too complex for easy understanding. Terminology is not always consistent, 
and certain terms are used in a rather special way, like “plural” for 22 and “Gram- 
Charlier series” for the first two terms thereof. And the discussion of “alternative 
methods” (i.e., approximations which were considered for presentation but then re- 
jected) might well have mentioned the excellent Camp-Paulson approximation. 

A number of examples are given to illustrate the use of the various methods, but 
some of them are difficult to follow. It would have been helpful to use the same set of 
parameters for illustrating several different methods, instead of using a new set 
almost every time. In one of the cases where this was done, the author missed an 
excellent opportunity to point out that the second term of the Type B series (i.e. the 
Gram-Charlier modification of the Poisson) contributes nothing whea c=np+1. 

The “addenda” sheet furnished with the book corrects a number of the errors which 
are inevitable in a work of this kind, but several errors still remain. The most impor- 
tant of these are: (1) in Figure 4 (p. 15) the binomial curve for p=0.5 needs to 
be shifted one unit to the right; (2) in the top two lines on page 45 the expression 
(66,100,.03) ought to be (66,1300,.05) ; (3) Table C5 is labeled as covering p=.01(.01).5 
instead of .01(.01).1(.05).5; (4) on page 71 the first minus sign in the expression for a 
ought to be a plus instead. 

Finally, the reviewer’s relation to this work calls for a brief comment, since reviewer 
and author worked independently on a good deal of this material with a remarkable 
similarity of approach.? While such parallel activity has often occurred in the history 


1 The notation B(c, n, p) is used for the probability of at least c successes in n independent trials with a proba- 
bility p of success at each trial. 
* Cf. reviewer's paper “On approximating the point binomial” in this journal for June 1956, pp. 293ff. 


q 
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of science, it is always a little surprising to find one’s self personally involved in such 
a situation. The reviewer gladly acknowledges the priority of Smith’s work, and 
hopes that the book under review will receive the wide distribution it deserves. 


A Manual of Style. Eleventh Edition. Staff of the University of Chicago Press. Chicago, 
Illinois: The University of Chicago Press, 1956. Pp. x, 534. $5.50. 


Kennets W. Harmer, American Telephone & Telegraph Company 


Is style manual is an old friend of authors, editors, and publishers. Now in it8 

eleventh edition, it has been expanding steadily in content and in usefulness for 
more than fifty years. A reference book concerned entirely with printing and publish- 
ing style, it does not discuss writing style beyond punctuation. But it amply demon- 
strates the point that there is enough to say about the mechanics of styling books and 
other formal publications to fill a rather large volume: specifically, a volume of 534 
pages. 

The content is divided into six major parts: PLanNinG A Boox; Rugs ror PRrEpa- 
RATION OF Copy; Hints To AuTHoRs AND Epirors; Forms ror Lerrer WRITING; 
TECHNICAL TeRMS, SYMBOLS, AND NUMERALS; AND SPECIMENS oF Type. These parts 
are supplemented by two indexes: a General Index, and an Index to the Type Speci- 
mens. 

The section on PLANNING A Boox discusses briefly (19 pages) the typographic con- 
siderations that must be taken into account when preparing to write a book, and then 
lists the component parts of a book and explains their form. The information in these 
pages is excellent, but doesn’t go quite far enough for this reviewer, who would par- 
ticularly like to see such components as title page, contents page, and text pages ex- 
plained by illustration as well as by description. 

RULEs FOR PREPARATION OF Copy is one of the larger sections of the manual (more 
than 150 pages) and is a mine of information and counsel. The topics covered extend 
from capitalization, word division, and the use of italics, to tabular design and mathe- 
matical composition. As in all such matters, there are small points that could be ar- 
gued. For example, one or two of the tabular practices seem somewhat outmoded. 
Yet the general handling of tabular material is so expert that any criticism would be 
quibbling. 

The third section, Hints Tro AurHors, Eprrors, Reapers, although less than 
fifty pages, is one of the more valuable parts of the book. The information on copy- 
rights is particularly helpful. Incidentally, the term readers, as used in this section, 
refers to proofreaders and copyreaders, whose services and responsibilities are briefly 
but adequately discussed. 

Forms ror Letrer Writ1nc is such a short section (7 pages) on such a limited sub- 
ject that it might well have been included in the opening section along with spelling, 
punctuation, and similar conventions of writing. If it is important enough to warrant 
a separate chapter, this material on forms of address could profitably be expanded to 
include a discussion of physical form for letter writing. 

The Giossary or TecHNICAL TERMs is compact, clear, and concise. It is the best 
brief explanation of printing and publishing terms that this reviewer has seen. Only 
one very small flaw was detected. Under “Paragraphs” two types are listed: regular 
(with the first line indented) and hanging (with all but the first line indented). No 
mention is made of flush paragraphs (no lines indented) which are widely used in 
letters and in short collections of quotations or excerpts. 

Although the glossary is excellent, and properly includes such trade terms as recto, 
verso, and folio, this reviewer doubts the widsom, or virtue, of blandly using these 
terms throughout the book, with the implication not only that authors must learn 
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them but also that they must use them. It is certainly desirable for authors to use 
such terms as Ben Day, bleed, pica, or point; and to understand such terms as lead- 
ing, font, mortise, or signature; but suggesting that they use verso instead of left 
hand page, recto instead of right hand page, and folio instead of page number gives 
the manual a tinge of pedantry that seems out of place in the midst of its general 
tone of practical common sense. 

The last major section is a large collection of Type Specimens. It presents 200 
pages of old and new type faces, ordered alphabetically. Most of these are shown in 
several sizes and many in two line-spacings: set solid, and with extra space between 
lines. The current revision also includes cpp measures (number of characters per pica 
of line width) which further aids the designer in estimating the amount of space a 
given amount of copy will require. The selection of types has been made with knowl- 
edge and discrimination, and consequently this part of the manual is undoubtedly 
valuable for book designers and editors. But it seems unlikely that this material is 
of very much use, or interest, to authors. If this reviewer’s opinion were asked, he 
would suggest omitting this section and publishing it, in enlarged form, as a separate 
volume. It is not thorough or complete enough to satisfy anyone deeply interested in 
book design and production, yet is a much larger catalogue than authors need or will 
make use of. If some of these pages were used to expand the preceding sections, this 
manual’s value to authors would be increased. 

The physical makeup of the book is excellent. As is fitting in a manual about typo- 
graphic style, the typography is clean and crisp throughout. One small speck on an 
otherwise spotless product: the page numbers, placed in the bottom outside corners, 
are too similar in style, size, and weight to the paragraph numbers, with which they 
align on all left-hand pages. Reading down the side of the page and finding 369, 370, 371, 
followed by 216, causes just enough surprise to be a disturbance. But this is certainly 
a small matter. In fact this flaw, like the few others already mentioned, is noticeable 
only because it is in contrast to the general excellence of the book as a whole. 

This style manual should be a boon to authors of technical or scholarly works, es- 
pecially to writers who have not appeared in print before. It answers many questions 
that writers are aware of, and many more that they should be aware of. If new writers 
and experienced authors earnestly make use of the information and guidance it 
offers, this manual! of style will also be a godsend to editors, publishers, and printers. 


A Manual for Writers of Term Papers, Theses, and Dissertations, Revised Edition. 
Kate L. Turabian. Chicago: The University of Chicago Press, 1955. Pp. v, 82. $1.25. 
Paper. 


Kennets W. Harmer, American Telephone & Telegraph Company 


x explanation on the cover of this manual states that it “. . . serves as an author- 
itative guide to scholarly style for typewritten reports of research in both scien- 
tific and nonscientific fields, covering matters of format, footnotes, and bibliography, 
use of quotations, tables, and illustrations, clarifying points of form by numerous 
examples and sample pages.” 

This description adequately suggests the content of the manual. The key words 
to note are scholarly, typewritten, and form. In effect, this booklet is a very much ab- 
breviated—and truncated—version of The University of Chicago Press publication 
A Manual of Style, adapted to the limitations of the typewriter. It is indeed focused 
on the scholarly style: more than a third of the body of the manual is about footnotes. 
The discussion of footnotes and the bibliography together makes up half of the con- 
tent (34 of 63 pages). True to her promise, the author has placed almost the entire 
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emphasis on form. She has carefully limited the discussion to how, successfully resist- 
ing the temptation to explain why. 

A brief résumé of the entire content of this manual follows: Chapter I, Taz Format 
oF THE Paper, is mistitled. Format is defined as “The shape, style, and general ap- 
pearance of a book as determined by type, margins, etc.” (Manual of Style, page 251, 
The University of Chicago Press). This one-page opening chapter actually lists the 
physical elements or component parts of a formal paper. Chapter II, discusses THE 
Prewiminarizs, the “front matter” of a paper. Chapter III covers the form of the 
Text or body of the report. These first three chapters, totaling 12 pages, are sound 
and sensible and for the most part carefully and accurately written. 

Chapter IV is the large section on Footnotes (24 pages) already mentioned. It 
discusses this topic under two major divisions: reference notes, and content notes, 
with most of the emphasis on the former. Inclusion of a great many carefully organ- 
ized examples helps to make this chapter an excellent reference. The list of recom- 
mended abbreviations does not include ibid., op. cit., and loc. cit., and for a moment 
this reviewer was cheered by the hope that they had finally been replaced by English 
terms. The scholarly threesome is discussed at length in the text, however, with no 
hint that they are anything but sacred. But perhaps the day will yet come when Latin 
reference terms are not needed as proof of scholarship. Certainly there is little excuse 
for not using English words that take no additional space: see requires no more space 
than cf., below no more than infra, and above no more than supra. 

Chapter V divides its ten pages between discussing and illustrating Tapes. The 
discussion is useful, but unfortunately some of the illustrations are not as impeccably 
correct as a scholarly paper requires them to be. Four of the six display at least minor 
violations of good practice. Chapter VI provides four pages of suggestions and direc- 
tions for presenting ILLusrrations. In general, this information is sound and helpful. 
Chapter VII explains some of the ways in which the form of Screntiric Parers 
differs from that of other scholarly papers. Most of these five and a half pages discuss 
footnotes. The final chapter, VIII, provides a paragraph about the Appenpix and 
five pages about the form of Taz Brsiiograpuy. 

Appendix I, Typing Paper, and Appendix II, Somz or Punctuation, 
complete the discussion. A third appendix shows several SaMpLE PaGEs oF A PAPER. 
An extensive and excellent topical InpEx completes the manual. 

In general this manual was a disappointment to this reviewer, who perhaps ex- 
pected too much. It does mention a great many details of form and provides a large 
quantity of information in a small volume. Yet its instructions are often either too 
indefinite or too dogmatic. Example of the former (page 13): 


They [footnote numbers] may begin with “1”... on each page, or with “1” at the 
beginning of each chapter, or they may run in one series through the entire paper. 


Some further explanation, suggesting when to use which method, seems needed here. 
In the present form the reader has no basis for selection. 


Typical example of dogmatism (page 76): 
The list of illustrations should be placed on a separate page from the list of tables. 


Certainly there is no sensible reason—even in scholarly works—for insisting that 
these two lists must be on separate pages no matter how few items each contains. 

One other general criticism: the preoccupation with methods of documentation, 
although perhaps’ justified, gives the impression that form is at least as important as 
substance. No doubt the author of this manual expects the reader to take for granted 
the vital fact that no amount of itemization of the sources that were drawn upon 
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can sbstitute for lack of idea content in a scholarly paper. But this reviewer would 
like te see the manual begin with a strong statement to that effect. 

Nevertheless, Turabian has done the writers of theses and dissertations an invalu- 
able service in bringing together so much information about so many details that 
institutions of learning are so concerned about. Her problem was not an easy one to 
meet: unfortunately a style manual cannot please everyone, even when it is limited to 
scholarly presentation. 

Perhaps a fair judgment of this manual would be to say that it is undoubtedly 
very helpful to its intended audience, but, by being more specific in general matters 
and less dogmatic in specific matters, it could be considerably more helpful. 


Technical Publications, Their Purpose, Preparation, and Production. C. Baker. John Wiley 
and Sons, Inc., New York, 1955. Pp. xiii, 302. $6.00. 


Kenneta W. Harmer, American Telephone & Telegraph Company 


book unintentionally underlines the major problem facing the author of a 
] handbook on technical writing. Shall he direct his effort toward a rather narrow 
group and talk to them in their own language about subject matter they know inti- 
mately, or shall he try to talk to a wider audience in more general terms? In this 
case the author aimed at a middle course. Yet, when he came to write the Preface, he 
found it necessary to say that “This book is primarily concerned with scientific occu- 
pations, expecially engineering.” To be more specific, it is concerned with aircraft en- 
gineering. Added to this is the further qualification that it was written by a British 
author who uses British diction and cites British references about British practice. 
Despite the broad applicability of much of the information it contains, therefore, 
this book is unlikely to have a large audience among non-engineers on this side of the 
ocean. Yet there :s in it a great deal of practical wisdom that all technical writers 
could profit by. The substantial chapter on meeting the reader’s needs is especially 
well developed, as is the chapter on copy writing. The chapter on the value of illustra- 
tions and its companion on graphic illustrations are thorough and seem competent, 
although here, particularly, the author’s concentration on the field of aircraft en- 
gineering is a barrier that discourages the interest and obstructs the understanding of 
nop-engineers. On the other hand the chapters on reproduction and on preparing 
and correcting copy are applicable in any area of technical publication. 
Here are a few of the author’s cogent comments, selected from different parts of 
his book: 
Listeners have an unfortunate habit of tiring before the lecturer. (Forms of Technical 
Publicity—Scientific Papers) 


Most technical publications are books in miniature, and when possible should be 
drafted in three main parts: the introduction, the main paragraphs and the conclusion. 
(The Use of Words) 


The fundamental fact that a technical publication should never tax the reader’s ability 
to interpret it but should leave his mind free to study its scientific message is as true 
of a mathematical treatise as it is of a practical instruction conveyed by words or 
drawings. (Copy Writing) 


Technical illustrations should never be included in the book if they merely repeat in 
technical form what has been said adequately in words. (The Value of Illustrations) 


It is essential for a technical author to have some knowledge of the principal methods 
in use for reproducing both text and illustrations and to possess a more intimate ac- 
quaintance with the systems he is most likely to employ. (Techniques of Reproduc- 
tion) 
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Typewriting . . . is the obvious medium to most people, but there are still some who 

. . pass to their printer notes that are handwritten in pencil. These authors are the 
first to complain of the results they receive and the prices they are charged. (Preparing, 
Proof Correcting and Producing) 


As these bits show clearly, the author has written a knowledgeabie and helpful 
guide to the preparation and production of technical publications. Unfortunately for 
most readers of this journal, it is a book written in the direction of engineers. 


Statistical Abstract of the United States, 1956. U. S. Department of Commerce, Bureau of 
the Census. Prepared under the direction of Edwin D. Goldfield. Washington, D. C.: Govern- 
ment Printing Office, 1956. Pp. xvi, 1049. $3.75. 
<<érpuats year the text notes for the second half of the volume have been revised to 
make them more comprehensive and consistent. This completes the revision of 
the text material begun last year. Ninety-seven new tables have been introduced, 
including one group devoted to an initial presentation of data on physical geography 
and a second group containing information from the 1954 Census of Business... .” 

“In addition to the 1954 Census of Business, the 1954 Censuses of Agriculture, 
Manufactures, and Mineral Industries are also represented. Further and more de- 
tailed statistics will be included in the next edition.” 

“Among tables shown in the last edition, 87 were omitted from this issue... . 
Some of the information in the omitted tables was absorbed in other tables still pres- 
ent.” 

W. A. W. 


Year Book of Labour Statistics, 1955. Fifteenth Issue. International Labour Office. Geneva, 


1955, Pp. xv, 455. $5.00 
Cuar.es D. Stewart, U. 8. Department of Labor 


HIS volume is, of course, a reference—not a book to read and hardly one to review: 

Users of economic data, however, especially those concerned with economic de- 
velopment or comparative structure, ought to be familiar with the resources it has to 
offer. The ILO is a specialized agency of the United Nations, and this volume is the 
specialized handbook in the field of labor statistics, in the series of UN statistical 
yearbooks and reports. 

In format and presentation the volume is of the quality we have come to expect 
from the statistical offices of the international agencies. The result is deceptive. The 
mass of economic data, systematically organized and neatly printed, covering 455 
large pages, gives an exaggerated impression of progress in economic reporting. The 
editors carefully point out, in prefatory notes to each section, that “international 
comparability is subject to certain reservations,” and refer users to technical, de- 
scriptive sources. The scope of each series is carefully indicated, but the reader is left 
to his own sophistication and knowledge as to the statistical validity of the data. For 
this the editors can hardly be blamed. 

The over-all achievement, however, is not to be minimized. All of the major series 
in the field of labor statistics, except the relatively scattered and intermittent data 
on productivity, and except for Russia, are made readily accessible, to a degree rare 
in the official publications of the individual countries. Absolute figures are given in all 
instances, as in tie original sources; indexes (generally on a 1948 base) have been 
constructed wherever appropriate for time-series and comparative purposes. In- 
dustry detail is reconciled to the degree possible in terms of divisions and major 
groups of the International Standard Industrial Classification. Annual data are 
shown for the 3 pre-war years and for years since 1948, with monthly or quarterly 
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data usually from 1952 or 1953. General differences in the character of the various 
national series, whether from sample surveys, establishment reports, or adminis- 
trative sources, are ingeniously indicated in column heads; variations in scope, etc., 
are carefully footnoted. All of the introductory texts for the 11 sections are in English, 
French and Spanish. 


Economic Handbook, A Visual Survey. Charles B. Fowler, John I. Griffin, Jerome B. 
Cohen, Joseph Cropsey, William I. Greenwald, and Frederick Sethur. New York: Thomas 
Y. Crowell Co., 1955. Pp. x, 246. $2.45. Paper. 


Marion Hamitton Giium, Barnard College 


— highly useful book treats sixty topics classified under the following ten 
headings familiar to every student and instructor of introductory economics: 
National Income, Resources and Industries, Labor Force and Labor Economics, 
American Enterprise System, Money and Banking, Prices, Public Finance, Inter- 
national Economics, Private and Social Insurance, and Consumption and the Stand- 
ard of Living. It is essentially a handbook of the economy of the United States with 
fifty topics concerned with our internal economy and six with our foreign economic 
relations. Only four topics deal specifically with other parts of the earth: World 
Petroleum, Work Time Required to Buy Food, the Canadian Economy, and Trading 
Worlds and Population Worlds. 

Each topic receives a uniform four-page treatment consisting of a page of descrip- 
tive text, a page of questions for study and testing, a table, and a chart. The page of 
questions with the table on its reverse side has a space for the student’s name and a 
dotted line along its inner margin suggesting that it is to be torn from the book and 
turned in as a class assignment. The final pages of the book provide a tabular cross- 
reference to sixteen well-known introductory textbooks. 

This volume can contribute significantly to the knowledge, convenience, and inter- 
est of the beginning student of economics. The charts are done in red, green, and 
blue, as well as contrasting lines and shadings, and are attractive and easy to read. 
This handy book brings together important information from a variety of govern- 
mental and private sources. It should discourage the too-frequent attempts of first- 
year students to discuss current economic problems without making use of avail- 
able factual materials. The book, also, offers practice in the interpretation of index 
numbers and a variety of types of charts including semilogarithmic charts, pie charts, 
bar charts, and statistical maps. 

When the time comes to revise this book, a few changes might make it even more 
useful than it is now. If the questions were grouped together in a final section, instead 
of being printed on the reverse of the pages containing the tables, the student would 
be able to remove the question sheets and still keep the tables and charts together 
for future reference. More specific citation of the sources of the data by reference to 
the titles of publications, instead of only to the names of the issuing agencies as at 
present, would aid the student looking for current statistics in each series. As is done 
in the charts of time series, room might be left in the tables for writing in data more 
recent than those available at the time of publication. The encouragement to the 
student to seek the latest statistics himself would help make him familiar with many 
of the publications which are the original sources of economic data. The addition of 
a brief, non-technical explanation of the methods used by each agency in preparing 
each group of statistics would assist the student in interpreting them. Among statistics 
which might well be added are series showing farm prices and parity indexes, family 
expenditures by major categories, population and labor force by age and sex, and 
average hours of work. 


4 
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The book does much to fill a long-standing need of students in introductory eco- 
nomics for readily accessible data. Its authors and publisher can further the improve- 
ment of the teaching of the first course in economics by keeping this book up-to-date 
with periodic revisions. 


Handbook of Commercial, Financial and Information Services. Compiled by Walter Haus- 
dorfer. New York: Special Libraries Association, 1956. Pp. 240. Paper. $5.00. 

CCORDING to the publisher’s announcement, this fifth edition “describes in consider- 
A able detail 776 organizations which supply specialized information on a contract 
or demand basis. Investment, taxation, market research, insurance, utility, advertis- 
ing, transportation, and many other specialized agencies in the United States, Canada 
and abroad are covered. .. . Each entry includes address, periodical and book pub- 
lications, scope and types of services offered, and, when furnished, price and charges. 
All listings are based on authoritative data supplied by the organizations themselves.” 

W. A. W. 


Worid Weights and Measures, Provisional Edition. Statistical Standards, Series M, No. 
20. New York: Statistical Office, United Nations, 1955. Pp. 225. $2.00. 


Cuar.es L. Ricnarp, San Clemente, California 


THOUGH this work is presented primarily as a cambist for statisticians, it has 
also considerable value as a comprehensive and convenient reference volume for 
persons or agencies concerned with practical aspects of international trade and as a 
guide for customs, consular, or other officials in their appraisals or computations of 
quantity values. The Section devoted to Unrr or SeLectep ComMopiTIEs 
presents information not usually found in earlier treatises of comparable character 


but frequently required in connection with problems of cargo stowage, material 
handling, or warehousing and supply or transportation logistics. 


British Incomes and Savings. H. F. Lydall. Oxford: Basil Blackwell, 1955. Pp. xii, 274. 
32s. 6d. 


Mitton GILBERT, Organization for European Economic Cooperation, Paris 


HILE we in the United States have come to regard the Federal Reserve Board’s 
Wee of consumer finances as a regular part of our statistical source materials, 
such information is still quite rare in Europe. It is very welcome, therefore, that the 
Oxford Institute of Statistics has taken an initiative in this field and produced a 
study which will be widely used both for the mass of new material it contains and 
for the full description it provides of the techniques of the survey itself. Although a 
good bit of the data appeared earlier as articles in the Institute’s “Bulletin,” it is very 
helpful to have the full study in this convenient form which can be more widely cir- 
culated. 

In the main, the volume is a straightforward and well organized report of an inter- 
view survey conducted in 1952 of a sample of some 2,600 income units in Great 
Britain. With the experience of the Michigan Survey Research Center to call upon, 
the study was well designed technically and shows much care and good sense in the 
choice of data collected. In addition to size and type of income, the survey included 
current liquid savings by type, liquid asset holdings, taxes paid, and durable-goods 
purchases, as well as a variety of social information such as age, family composition, 
occupation, rent paid, and ownership of durable goods. All this material is appro- 
priately, and even imaginatively, classified and cross-classified and presented clearly 
in a series of chapters covering income distribution, ownership of assets, personal 
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saving and consumption, determinants of saving, and household income, rent and 
rates. 

Lydall avails himself of the opportunity of comparing the survey’s results with the 
aggregates given by other sources and is very helpful in giving a rather frank assess- 
ment of the reliability of his estimates. As might be expected, the survey’s coverage 
of investment income is substantially under the total from the National Income Blue 
Book but both employment income and entrepreneurial earnings are quite accurate. 
Similarly, as the income reported in the highest income brackets is too low, there is a 
shortage in the coverage of savings. Nonetheless, while the survey sampled a smaller 
universe, it uncovered about 70% of the larger components of saving, though for the 
smaller components it comes out in the neighborhood of 50%. As I am so often told 
that Europeans are much more reluctant than Americans to give statistical informa- 
tion, it is interesting to see that direct refusals were 16% of the planned sample with 
another 9% of non-responses being due to illness, absence, lack of information, and 
other causes. It seems to me that these results are very good for a first survey of this 
kind, taken by a private institution, and should be encouraging to European statisti- 
cians. 

The results of the survey are too extensive even to be listed here; it must suffice to 
say that they are of great interest not only in a descriptive sense but for their perti- 
nence to important policy problems, such as the fiscal system and post-war housing 
policy. I would like to call attention, however, to Lydall’s provocative analysis of the 
determinants of saving. He shows the primary importance of income level in the 
savings distribution, and then discusses the influence of other factors, such as change 
in income, windfall receipts, ownership of liquid assets, expenditure on durables, and 
unusual expense, both within income classes and over the distribution as a whole. 
Comparison is made with data for the United States which indicates that the pro- 
pensity to save was much lower in Britain at the time, though the shape of the saving- 
income relation is quite similar over the income range from 0.5 to 3 times the mean 
gross income. At higher income levels the British saving propensity falls off rapidly 
compared to the United States because of the much more progressive tax structure. 

It remains to be seen how much of this comparison is indicative of enduring rela- 
tionships, inasmuch as the period of the survey was a peculiar one for Britain. Since 
prices were rising sharply and real income probably declining, one would expect sav- 
ings to be depressed. On the other hand, durable-goods purchases were still at quite 
low level, which would work in the opposite direction. In any case, it does appear 
from aggregate data that savings rose substantially in later years. 

It is to be hoped, therefore, that this work is to be put on a continuing basis as 
both the reliability and analytical usefulness of the data are encrmously improved 
when they are collected over a span of years. 


Policies to Combat Depressions. A Conference of the Universities-National Bureau Com- 
mittee for Economic Research. Princeton: Princeton University Press, 1956. Pp. x, 417. 
$8.50. 


Carus E. Watxer, Federal Reserve Bank of Dallas 


HIs book consists of fourteen essays that grew out of two conferences sponsored 

by the Universities-National Bureau Committee for Economic Research at 

Princeton in October 1953 and May 1954. The purpose of the conferences (as stated 

by Herbert Stein in his introduction to the volume) was “to survey the existing state 

of readiness to deal with the problem of depression, in terms both of understanding 
the problem and of the availability of instruments to deal with it.” 

Papers by R. Gordon (“Types of Depressions and Programs to Combat Them”), 
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B. Caplan (“A Case Study: The 1948-1949 Recession”), and K. Boulding (“Structure 
and Stability: The Economics of the Next Adjustment”) are evidently intended to 
provide the general theoretical and descriptive base for the more specialized discus- 
sions that follow. Unfortunately, these papers—although valuable contributions for 
other reasons—fail to bridge the gap between description and prescription. Gordon’s 
paper is most fruitful in this respect, but as Bratt points out, the similarity of develop- 
ments in the early stages of each type of recession may well preclude the early and 
effective selection of alternative remedies that Gordon recommends. Caplan’s excel- 
lent analysis of the 1948-1949 recession is marred by the emptiness of his concluding 
recommendation: “The answer [to the problem of short-range fluctuations} must lie 
in the continual improvement of basic long-range policies on the part of government, 
business, labor, and the public generally. The better these policies are—and they 
have been improved—the less significant may become the short-range fluctuations” 
(p. 52). Few would quarrel with this statement, but what does it tell us with respect 
to the problem at hand? Similarly, Boulding, who rightly emphasizes the importance 
of structural maladjustments and the need for different policies to deal with different 
types of recessions and depressions, does not add significantly to our knowledge of 
how to combat depression by concluding: “Thus the time may now be ripe for a new 
generation of economists to turn their attention to the structure of the economic ag- 
gregates, with a view both to elucidating what subaggregates are essential parts of 
the economic information system and to collecting information about them” (p. 74). 

Certain aspects of the built-in flexibility of government finance are examined by 
D. Lusher (“The Stabilizing Effectiveness of Budget Flexibility”), J. Pechman 
(“Yield of the Individual Income Tax during a Recession”), R. Goode (“The Corp- 
orate Income Tax in a Depression”), C. Heer (“Stabilizing State and Local Finance”), 
I. Merriam (“Social Security Programs and Economic Stability”), and K. Fox (“The 
Contribution of Farm Price Support Programs to General Economic Stability”). 
These essays are the major contribution of the volume. However, the modesty of the 
claims of several of the authors for the particular instrument or arrangement dis- 
cussed by each may be misleading from the standpoint of their net effect on aggregate 
demand. The combined effect of budget flexibility and the other stabilizers might be 
significant indeed. Moreover, models may understate the effectiveness of the built-in 
stabilizers to the extent that growing public knowledge of and belief in the stabilizers 
may contribute to increased business and consumer confidence, which might in turn 
promote greater stability in aggregate demand. 

Additional papers by L. Grebler, W. Owen, D. Johnson, and R. Triffin discuss 
housing, public works, international commodity programs, and inteznational mone- 
tary arrangements. Advocates of monetary policy as an anti-cyclical wéapon—and 
particularly those (such as this reviewer) who believe that its effectiveness as an 
anti-recessionary instrument has been underestimated—will be disappointed that, 
except for a few scattered references, its discussion is limited to a brief summary of a 
paper by R. Roosa. 

The contributions of the papers far outweigh the shortcomings. The volume de- 
serves the critical examination of both economists and policy officials. 


Studies in Inter-Sectoral Relations. P. N¢rregaard Rasmussen. Amsterdam: North- 
Holland Publishing Company, 1956. Pp. 217. $6.00. Paper. 


Wa. C. Hoop, University of Toronto 


HIs volume is an excursion into three methodological problems of inter-industry, 
input-output analysis. The methodology is illustrated in each case with empirical 
data applying for the most part to Denmark. The author is anxious to emphasize 
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however that his interest is in method and that neither his data nor his instrument of 
analysis would yet warrant an attempt to answer practical questions in applied 
economics. 

As is general in input-output analysis, the economy is considered to be divided 
into a number of sectors or industries. In each industry the primary inputs, labor and 
capital are combined with imports and outputs of domestic industries, to yield out- 
put which may be distributed to domestic industries for further processing or to 
“final users,” either domestic or foreign. The problems with which he deals are the 
following. 

First, what is the effect of changes in prices of primary inputs and imports on 
prices of the outputs of domestic industries, or conversely what is the effect on prices 
of primary inputs of changes in the prices of imports and domestic outputs? As one 
of several related problems he considers the impact on the “cost of living” (the sum of 
weighted output prices with the weight for each industry being its contribution to 
the total domestic supply of consumers’ goods) of changes in prices of imports and 
prices of primary inputs. 

Second, how does a change in the relative prices of their outputs affect the dis- 
tribution among industries of the incomes earned by the primary factors employed 
in them? A subsidiary question concerns necessary conditions for a zero transfer of 
income among industries in the face of a change in relative output prices. 

Third, how may “structural change” be conceived and measured? 

In dealing with the first problem it is assumed that the output of each industry is 
homogeneous and that the price of each industry’s output is the same irrespective of 
its destination. In similar vein, inputs of each primary factor and of imports are as- 
sumed to be homogeneous and to command the same price irrespective of the industry 
in which they are employed. Moreover it is assumed that volumes of inputs and out- 
puts do not respond to changes in prices, that input-output coefficients are constant, 
and that there are no limitations on the supplies of factors and no alternative produc- 
tion processes for any class of output. On these assumptions, and using illustrative 
Danish data for twenty-one industries, calculations of the effects on the prices of the 
outputs of each industry of a ten percent increase in each of average factor payments 
and average import prices are shown along with the effects on the “cost of living.” 

In treating the second problem, the transfer of income to any industry as between 
two periods is taken to be its gain arising from changes in prices of its exports and im- 
ports plus transfers of income from other industries. The former is defined as the cur- 
rent dollar value less the constant dollar value of the difference between its exports 
and imports in period two. The latter is defined as the value in current dollars of its 
factor payments in period two, apart from gains from changes in the prices of its 
exports and imports, less its factor payments in period two valued at the average 
increase in factor prices over all industries. This average price increase is measured 
as the ratio in period two of the current dollar value of factor payments in the eco- 
nomy (apart from the gain or loss in the economy from changes in export and import 
prices) to the constant dollar value of all factor payments. To measure these trans- 
fers, thus defined, it is necessary to have the relevant data from input-output tables 
for two years in current dollars and for one of the years to have these flows expressed 
in the prices of the other year. Such tables were available for Denmark and calcula- 
tions of the two classes of transfer for each of twenty-one industries for 1947 as com- 
pared with 1949 are presented. On rather more heroic assumptions, estimates for 
Norway and the United Kingdom are also shown. 

Several measures of structural change are proposed and estimated from Danish 
data for 1947 and 1949. In all cases the data are corrected for changes in prices. Let 
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A® and A! represent the matrices of input-output coefficients and X* and X' represent 
the corresponding matrices of inter-industry flows respectively, in two years 0 and 1. 
One set of measures of structural change characterizes the distribution of weighted 
relative differences defined as 

Ag} 
Another measure discussed is the set of deviations of industry outputs for year 1 as 
reflected directly in the data, from those calculated by multiplying the matrix [J — 
A‘ into the vector of “final” outputs for year 1. For the year 0, say, [J—A%j* 
represents the increase in the output of industry i per unit increase in the final de- 
mand for the product of industry 7. The author proposes to measure the average in- 
crease in the outputs of industries per unit increase in the final demand for the pro- 
duct of j, and the standard deviation of these coefficients for given j7. He also proposes 
to measure the average increase in the output of i corresponding to unit increases in 
the fina] demand for the product of each industry and the standard deviation of 
these coefficients for given i. Comparisons of these averages and standard deviations 
for different years provide measures of structural change. Similar measures are pro- 
posed for comparing matrices of the form {{7— A]~—J} in which the jth element re- 
presents the indirect effect on the output of i of an increase of one unit in the final 
demand for the product of j. Several variations on this theme are discussed. 

The author is exceedingly careful to stress the limitations imposed by the assump- 
tions of input-outout analysis as he and others currently practice it and also to stress 
the limitations or the basic data available. In the face of this scholarly approach it is 
difficult to be critical, but once again one is left with the perplexing question: how 
may all this arithmetic be made to help in dealing with our practical problems? One 
might infer that the author feels developments to this end require that the estimates 
of the parameters be based on more observations and that the restrictions on the vari- 
ables be made more general, but he does not address himeelf directly to these matters. 

The book concludes with an appendix on index number problems relating particu- 
larly to the measurement of terms of trade and an appendix describing the basic 
Danish data and its weaknesses. There is a bibliography and an index. The text is 
spotted with a number of inconsequential printer’s errors. 


(Xai? + Xi"). 


Industrial Wage and Salary Control. Robert W. Gilmour. New York: John Wiley & Sons, 
Inc., 1956. Pp. viii, 261. $7.50. 


Joun G. TurNBULL, University of Minnesota 


oB evaluation is a technique designed to relate systematically the “inherent” worth 
J of jobs (measured in non-monetary dimensions) with their money rates. Its use is 
predicated upon the belief that the labor market functions imperfectly and hence (1) 
that wage-rate inequities are created which cause employee unrest and (2) that wage- 
rate uncertainties develop which preclude effective wage and salary control. Its use 
has expanded significantly in the past fifteen years, in part because it was a mecha- 
nism by means of which employees and unions were able to secure wage gains other- 
wise denied by the wage and salary controls of World War II, and in larger measure 
because of a belief that it afforded a logical means for wage rationalization. 

Job evaluation involves a variety of administrative issues, such as organizing for 
and installing a plan, and a number of technical issues such as are involved in the de- 
tails of specific plans. It is these latter which are of interest here. Various methods of 
job evaluation are to be found, in part reflecting man’s ingenuity in devising different 
means of ascertaining the inherent worth of jobs. This book by Robert W. Gilmour 
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explores one such method; it is “a practical presentation of the development, installa- 
tion and administration of point evaluation plans as the basis for sound wage and 
salary controls.” 

The logic of point evaluation plans is simple; the sophistication with which Gil- 
mour approaches this type is not. A point plan requires first of all a series of job cri- 
teria, such as skill, effort, responsibility, and working conditions. Each of these is 
given a total point value, say 100. Jobs are then examined and awarded a specified 
number of points for each criterion, dependent upon how the jobs stand relative to 
each other. The points are totaled for each job, and a relationship developed with 
wage rates. This relationship is frequently expressed by a line, sloping upward and 
to the right, reflecting the notion the higher the point total for the job, the higher the 
rate. 

But how many total points should each factor (criterion) have? An equal number? 
Different numbers? Should, e.g., skill have a total range of 100 points, effort 50? 
Moreover, presumably different point values for each factor will be given to different 
jobs. How fine should such a breakdown be, i.e., how many degrees should each factor 
have? Should, e.g., skill be broken into 5 degrees? Ten? Twenty? More importantly, 
how many points should each degree of the factor have? Should the progression be- 
tween degrees be arithmetic? Geometric? 

Point plans have frequently answered these questions on a hunch and guess basis; 
Gilmour does not. Gilmour assumes (p. 65) that the “use of a straight line to describe 
the resu!ting [wage] structure should be one of the basic objectives in the development 
of a point plan.” If this is the case, then the factors and the degrees of factors need to 
be weighted so as to achieve this linear relationship. Gilmour utilizes the technique of 
multiple correlation to secure factor weights, employing successive correlations to 
refine the weights. 

If one holds that such a linear relationship is logical (and from our relative paucity 
of knowledge about wage structures there is no reason to assume that it is any less 
logical than other types) then the Gilmour method is much more refined than most 
point systems. (One still remains a little uneasy about all this, however, for is one 
really measuring inherent job content, or is one measuring something which statistics 
have “manipulated”? Notwithstanding Gilmour’s citation of the well-known Ezekiel 
caveat, one remains a bit perturbed.) 

While the multiple correlation technique (and allied procedures employed by Gil- 
mour, such as linear and second degree curve fitting) are elementary to statisticians, 
they are not likely to be so to the industrial relations personne! concerned with job 
evaluation. Hence, conceptually, this is likely to prove a difficult book for the practi- 
tioner. The book’s jacket notes: “The material calls for no previous training in sta- 
tistics.” This reviewer is doubtful. Gilmour has developed a sophisticated, highly 
useful approach to job evaluation, even if perhaps overrefined, given the dimensions 
of the basic data. But it does require statistical know-how (as well as considerable ex- 
perience with job evaluation installations) if one is to make maximum use of this 
volume. It is not a book for the beginner, even given an appendix on statistical ter- 
minology for the layman. 

What is the moral to all this? One might suggest that an author such as Gilmour 
write a statistical compendium as part of his treatment. The other suggestion is that 
industrial relations personnel need more training in statistics. Notwithstanding the 
efforts of some industrial relations teachers to emphasize the importance of statistics, 
the average trainee (even the wage and salary specialist) is not well equipped. Perhaps 
the statistics profession should push its own wares more vigorously. Statistics is a 
useful tool in the field of industrial relations. 


BOOK REVIEWS 119 


Capital and Output Trends in Mining Industries, 1870-1948. Occasional Paper 45. 
Israel Borenstein. New York: National Bureau of Economic Research, 1954. Pp. 81. $1. 
Paper. 

Ornis C. Herrinpant, Resources for the Future, Inc. 


HE major finding of this paper is a rise in the ratio of capital to output in the mining 

industries over the period 1870 to roughly 1919; the rise was followed by a decline 
in the ratio from about 1919 to 1948. The period over which the ratio increased was 
marked by a more rapid growth in mineral output than the period over which the 
ratio fell. The rise and subsequent fall in the capital-output ratio are found to char- 
acterize each mineral industry, although the amounts of rise and fall are not uniform 
for the different industries. The estimates of capital for 1870, 1880, 1890, 1909, and 
1919 are based directly on replies to questions on capital in the mineral censuses for 
those years. Capital estimates for 1929, 1940, and 1948 are derived by applying ratios 
of capital to output derived from corporate income tax data to total U.8. output for 
the different industries. 

Although these findings are carefully qualified, in ny judgment the author is in- 
clined to overrate their reliability. A large amount of the rise in the ratios from 1870 
to 1919 is probably attributable to under-reporting of capital in the early censuses and 
perhaps to the possibility that revaluations following tax law changes got into the 
amounts reported for the 1919 census. It is more likely that the results reported for 
the period following 1919 do reflect a genuine decline in capital-output ratios, but 
until it can be more firmly established that certain factors did not cause a spurious 
decline or until the real factors “causing” the decline are more clearly identified, 
the results should be regarded as decidedly tentative. 

There are two matters of concept that a reader of this paper should keep in mind. 
First, the paper is concerned with the quantity of capital used in the industry and 
not with the flow of services from the capital goods. The difference can be substantial 
for both different industries and different periods of time. 

Second, it is useful to distinguish between current production of a mineral product 
(such as copper ore) and production of a capital good used in current production, 
namely, the mineral deposit. On the output side, the problem of measuring the output 
of the capital good (the mineral deposit) is especially difficult. On the input side, 
measurement is complicated by the fact that some of the activities of a mineral firm 
jointly produce both kinds of product.' The definition of capital used in the paper 
includes, for firms classified in the minerals industries, the capital goods used in pro- 
ducing both kinds of output, but the output definition does not include the capital 
goods part of the output. 

An attempt is made to separate “land” from other capital, apparently on the ground 
that “land” is not involved in the demand for savings (p. 42). The usefulness of this 
separation seems doubtful, for what gets recorded in the land account depends in part 
on accidents of market turnover, as is recognized on p. 43. More important, it is 
difficult to see why a mineral deposit should be treated differently from other capital 
goods. And in the long run the value of a mineral deposit does tend to reflect real ex- 
penditures on discovery and development (as is recognized for oil on p. 42). 

The author is of the opinion that use of census data for both capital and product 
insures a high degree of comparability in the measures of capital and output (p. 63). 
He recognizes the possibility of error in capital data (p. 75), but is willing to rely on 
them because of their consistent behavior. But the consistency may reflect consistent 
changes in sources of bias. Specifically, I suspect radical under-reporting of capital 


1 One of the more important aspects of “producing” a mineral deposit is finding out more about its extent and 
other characteristics. Current production activities play a part in this accumulation of information. 
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in the early censuses with fuller reporting as time went on, although it would be a 
mistake to think that the report on capital in mining ever was satisfactory. 

For example, in the Census of 1880? the Special Agent stated that the result of 
changing the capital question “. . . has been to increase very greatly the amount re- 
turned as ‘capital of the mining establishments’ . . .” (as compared with 1870). 

In the Census of 1890 in the iron ore section it is said that assessed valuations were 
sometimes used.’ In the section on petroleum it is said that the average value of a 
well reported for the Census was $1,839, “... which, for what is stated above, is 
evidently very low.”4 

In 1902 the capital questions were omitted in view of the past experience with them, 
and we are told, “A careful study of the statistics for the different censuses leads to the 
conclusion that there are no reliable comparative data for all mines and quarries 
other than the quantity and value of products.”® 

The following statement appears in the 1919 Census, the last mineral census to 
contain a capital question: “The reports received in respect to capital, however, at 
both censuses [1909 and 1919] have in so many cases been defective that the data 
compiled are of value only as indicating very general conditions.”* 

In addition to fuller reporting of capital in successive censuses, the 1919 values re- 
ceived an additional boost because of revaluations permitted by new tax laws. First, 
under the income tax law the taxpayer’s basis for his properties was the higher of cost 
or fair market value on March 1, 1913. Second, the discovery depletion provisions in 
the Revenue Act of 1918 (passed in 1919) were open to properties discovered after 
1913 and resulted in extensive revaluations, especially in petroleum. It seems likely 
that the 1913 revaluations reached books of account and hence got in the Census. 
Since the administrative organization for discovery depletion was not set up until 
late 1919, it is less clear that these revaluations affected replies to the census, but 
some probably did.’ 

Capital for 1929 and later years is estimated by applying ratios of capital to re- 
ceipts (taken from corporate income tax balance sheet returns) to value of product 
based on censuses and Bureau of Mines data. At this point an important step in the 
argument is omitted, namely, to establish that the reserve for depletion and depre- 
ciation (and hence the net assets) tends to reflect depletion and depreciation taken 
for business purposes rather than depletion and depreciation taken for tax purposes. 
If this were not the case, the assets as recorded in balance sheet returns would have 
become a progressively smaller part of the business accounting asset value because 
tax depletion is a good deal higher than cost depletion and is not limited to the cost 
of the asset. In addition, a substantial amount of capital outlays is expensed in tax 
accounting that is capitalized under “sound” accounting principles.* Balance sheet 
data in Statistics of Income commonly reflect “business” rather than tax depletion and 
depreciation,’ but even so, it is problematical whether “sound business accounting 


2 Report on the Mining Industries of the United States (1886), p. xxvii. 

2 Report on Mineral Industries in the United States ct the Eleventh Census: 1890 (1892), p. 15. 

4 Ibid., p. 433. 

5 Mines and Quarries, 1902 (1905), p. 9. 

© Mines and Quarries, 1919, p. 15. 

7 See Douglas Eldridge, Federal Income Taxation of Mineral Resources (1949), a Ph.D. thesis on file at the Uni- 
versity of Chicago. 

* According to one estimate, half of the outlays for finding and developing oil are expensed under “financial” 
accounting; three-fourths of these outlays are expensed under tax accounting. See J. G. McLean and R. W. Haigh, 
The Growth of Integrated Oil Companies, p. 251. 

* The method of estimating the value of “plant” (a step in deriving “land” as a residual for 1929 and later years) 
is not independent of tax depreciation and depletion, h . The proced d ibed on page 72 seems to be equiv- 
alent, except for averaging, to extrapolating 1919 improvements and equipment by net fixed assets to 1930, and 
extrapolating beyond 1930 by depreciation taken on tax returns. The question is how well tax depreciation reflects 
the movement of the “plant” component of non-land, 


BOOK REVIEWS 121 


practice” in these industries will yield results that fit the economist’s concepts very 
closely. 

One of the main problems resulting from using data from Statistics of Income arises 
from the basis on which firms are classified by industry. Mining and oil companies 
whose principal activity is considered to be smelting or refining—and this would in- 
clude many of the major companies—are classed in manufacturing rather than in 
mining. 

In assessing the importance of this difficulty, the author notes a fairly close agree- 
ment between his petroleum capital estimate and two independent estimates for 
1940 and 1948. He also compares the capital-product ratios for the appropriate groups 
in manufacturing with those in mining, but this is inconclusive, for what is needed is 
a comparison of this ratio for the mining activities of firms classified in manufacturing 
with the ratio for mining activities of those classified in mining. 

The potential importance of these classification difficulties is indicated by the per- 
centage of total corporate depletion taken by corporations classified in mining and 
quarrying. In the ’thirties this percentage ranged from 40 to 60, but in recent years 
it has run around a third.’ Thus more than half of the corporate mining activity ap- 
pears to be classified outside the mining and quarrying industry. 

The decline in the ratio of capital to output is especially large for the petroleum 
industry. From 1919 to 1948 the ratio fell by 70 per cent, and from 1929 to 1948 by 50 
per cent." These results can be roughly tested by working with data for the domestic 
production activities of the large oil companies.” My calculations, in which I tried to 
duplicate Borenstein’s procedures with the addition of a correction for years of low 
output, indicate a fall in the capital-output ratio from 1934 to 1948 of 23 per cent,but 
the ratio appears to have risen after 1948 so that from 1934 to 1953 the ratio fell by 
only 8 per cent. The annual rate at which Borenstein’s series falls appears to be some 
two and a half to three times greater. 

I would not urge that any great reliance be placed on these calculations—they are 
weak at a number of points—but the contrast with Borenstein’s results is sufficient to 
warrant close attention to factors that might explain the behavior of these two series. 
What portion of the observed declines represents a real decline in the ratio of capital 
to output in petroleum production? What portions of the observed declines is the re- 
sult of inadequacies of data and necessary departures from proper definitions? Some 
of the factors that may be involved can only be listed here. 

The first possibility is that the minor industry of crude petroleum, etc., in Statistics 
of Income does not properly represent oil production activities because of the classi- 
fication of many large oil companies in manufacturing. This may account for a sub- 
stantial part of the difference. 

Other factors that may be involved, some involving real change and some not, 
cause both series to change in the same direction, but the size of the effects may well 
have been different. 

That a genuine decline in the ratio of capital to output has occurred is suggested by 
the decline in the ratio of proved reserves to annual production of crude oi! from 21 
in 1909-1918 to 12 in 1946-1955." 

Because dry holes tend to get expensed, an increase in the ratio of dry holes to wells 
drilled would tend to decrease the ratio of capital to output. Dry holes as a percentage 


10 See Joseph Lerner, “A Discussion of Extractive Industries and National Income Accounting,” a paper given 
at the Conference on Research in Income and Wealth, Nov. 18-19, 1955, revised (mimeographed), p. 26. 

1 These ratios are calculated from values for both capital and output expressed in 1929 prices. 

12 See the annual financial analysis of 30 or more large oil companies published by the Chase National Bank 
(now The Chase Manhattan Bank). Titles and authors vary. 

1% American Petroleum Institute, Petroleum Facts and Figures, various years. 
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of wells drilled has increased by about a half from 1920 to the period after World 
War II." 

Other factors that may be involved include vertical disintegration, the development 
of spacing and production regulations, and the question of the adequacy of business 
accounting for the economist’s needs, including the possibility of over-expensing, 
over-depreciation, and over-depletion. The first of these warrants some comment here. 

Vertical disintegration of field services (notably well drilling) would progressively 
lower the capital figure (but not output) from what it otherwise would be since the 
capital for well drilling by firms that are classified in the minor industry, crude pe- 
troleum, would be reduced as drilling activity was transferred from these firms to 
firms classified in the field service industry. There has been an increasing tendency to 
hire well drilling rather than do it yourself. The percentage of total footage drilled by 
contractors was 68 in 1935, 82 in 1948, and 92 in 1955.* A substantial amount of 
money is involved, as is suggested by the fact that in 1953 about $2.5 billion was 
spent in the U. 8. on well drilling.” 

Borenstein’s Occasional Paper is, of course, preliminary and brief. We look forward 
to his full analysis of findings in the more detailed monograph that is to follow. 


Shoe Machinery: Buy or Lease? Revised Edition. Robert N. Anthony. New York: National 
Shoe Manufacturers Association, 1955. Pp. 91. $3.50. 


Harry V. Roserts, University of Chicago 


His volume was written to help shoe manufacturers decide whether to buy or 

continue to lease shoe machinery, a problem that arose as a result of the court 
decree in the United Shoe Machinery Corporation antitrust case. The book’s useful- 
ness, however, extends beyond the specific situation for which it was written. For 
economists and businessmen generally it provides an intelligent discussion of the de- 
cision between leasing and purchasing. For statisticians it provides a blueprint as to 
the kinds of data that are needed and a good treatment of the detailed questions that 
must be considered and resolved in the process of data collection. To my knowledge 
it is the best generally available discussion of the lease or purchase decision. 

The author’s treatment of two important and related aspects of his problem is open 
to the kind of criticism that must be made of all efforts in this area. First, he leaves 
one key number—the “required earnings rate” or discount rate—largely to judgment 
and gives little guidance for the process of judgment. Second, his discussion of the 
allowance for uncertainty in the calculations is not a satisfactory formulation of the 
problem of dealing with the risks associated with the decision. In particular, it is not 
obvious that the differential risks involved in purchase as opposed to leasing are 
higher than the other risks of being in business, as seems to be implied by pp. 39-40. 
The author’s analysis, therefore, does not give authoritative, unequivocal answers. 
It does, however, narrow the areas of judgment and formulate correctly many of the 
conceptual issues that have obscured much of the writing in this field. The author re- 
jects the bookkeeping conventions and arbitrary rules of thumb which have muddled 
most of the literature, and focusses clearly on the economic logic of the problem. 


Thid. 

8 Firms engaged primarily in providing field services, including independent driiling contractors, are not in- 
cluded in the Source Book data from which the ratio of capital to output is taken. If field service firms were included 
in both capital and output, the ratio of capital to output in, e.g., 1948, would have been 1.22 instead of 1.40. Field 
services had a capital-output ratio of only .64. 

16 The data are from the American Association of Oilwell Drilling Contractors. 

17 News release of the American Petroleum Institute for January 27, 1956. 
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The Hard-Surface Floor Covering Industry—A Case Study of Market Structure and Com- 
petition in Oligopoly. Robert F. Lanzillotti. Pullman: State College of Washington Press, 
1955. Pp. xiii, 204. $4.00. 


Watter E. Hoaptey, Jr., Armstrong Cork Company 


ANZILLOTTI’s book reflects the work of a determined statistical analyst seeking to 

develop basic data on the hard-surface floor covering (now more commonly 
known as “resilient” or smooth-surface flooring) industry during the period 1929-52 
by which to test contemporary oligopoly theory covering the competitive range be- 
tween pure competition and monopoly. In the course of this industrial market in- 
vestigation the age-old problem of statistical “gaps” was encountered to a much 
greater degree than the author apparently expected at the outset. Nevertheless, he 
used his highly developed statistical research skill, including extensive direct field 
interviews, to fit together fragments of data into some very useful industry informa- 
tion by which to pursue his original objective. 

That the author’s persistent efforts to overcome statistical deficiencies were suc- 
cessful is made clear in the statement of an editor of a leading retail trade daily who 
has taken note of Lanzillotti’s book by saying “. . . the amount (of information) he 
has unearthed is truly astonishing” (my italics). There’s a moral here for many siat- 
isticians and economists facing discouraging statistical gaps: readily available in- 
dustrial market data should almost be expected to be grossly inadequate for most 
careful studies, but there is always a wealth of raw statistical material to be obtained 
through intensive personal research efforts directed toward executives and other 
industry specialists as well as detailed secondary sources. Moreover, the “outside” pro- 
fessional research investigator can be of service to an industry by critically appraising 
various “trade estimates” found circulating in all markets where reliable statistics 
are neither available nor published regularly. 

It requires considerable statistical training, experience, and courage to make trust- 
worthy market estimates. However, unless the professional statistician demonstrates 
that he has these qualities, he abdicates his responsibilities all too often in favor of 
non-statisticians, including many policy makers, who proceed to fill statistical gaps 
with the crudest type of “guesstimates” simply because they must have some basis for 
making a decision. Lanzillotti’s book is an excellent example of successful statistical 
‘research in a complex market. 

The book’s seven chapters trace the historical background of the industry; outline 
the size-distribution and concentration of producers; pricing practices and price de- 
terminants; competitive behavior; profits, investment, and conditions of entry; and 
apply the findings to oligopoly theory with a final comment on the “workability of 
competition” and the industry outlook. 

Among the noteworthy conclusions drawn by the author from the statistical data 
developed is that the industry has been characterized by such dynamic changes in 
products and size and scope of participants that concentration “probably has declined 
since 1930” and that hard-surface flooring is a “workably competitive” industry. The 
growth potential of the industry’s products is held to be “very bright.” In character- 
izing the industry as “a special sub-case of oligopoly,” the author states that “In addi- 
tion to a few large diversified producers, there are two other groups of firms: (1) a 
fairly large number of appreciably smaller firms producing only one type of floor cov- 
ering, and (2) several large and highly diversified companies whose principal indus- 
trial activities lie outside of the floor covering field.” 

Lanzillotti’s findings in the resilient flooring industry point up a number of major 
shortcomings in contemporary oligopoly theory, particularly in the oversimplification 
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of interfirm price relationships and “leadership discipline.” It is noted that oligopoly 
theory makes insufficient or no allowance for (1) published prices varying frequently 
and markedly from transaction prices; (2) sales of first-line goods at prices quoted for 
“seconds” and “dropped patterns”; (3) multiproduct companies pursuing different 
price policies over time and for different lines of products; (4) independent price 
actions of the competitive fringe around the “oligopoly core”; (5) “lags” in price 
changes among firms, contradicting the “requisite cohesiveness” supposedly found 
among rivals in following the leader’s prices; and (6) “most serious,” the dominant 
firm’s subjective demand function evidently not coinciding or intersecting with its 
own and other sellers’ actual demand curves in such a manner that “maximum (or 
acceptable) profits were realized by all rivals and uncertainty eliminated.” 

While the book has some of the spark of close touch with reality lacking in many 
market studies, it suffers from the same dynamics which it describes. Much has hap- 
pened in the resilient flooring industry since the study was concluded four years ago. 
Part of the outlook section already has become history with not unexpected diver- 
gences between forecast and actual developments. It should be noted that in Table 18, 
p. 159, where per capita usage trends and prospects are shown, the units involved 
are square yards rather than square feet as given. 

On the wh ule, the book warrants continuing attention and reference, especially by 
students and others anxious to make still further progress in market statistical re- 
search, building upon Lanzillotti’s work. As for the resilient flooring industry, Lanzil- 
lotti’s findings have been noted with interest largely as summarized in the trade press. 
Few industry executives, however, seem to have read the book because they are 
rightly convinced by its style and format that it could not have been written for them. 
This raises a basic question which cannot be pursued here, of how best to capitalize 
upon the products of statistical research. 


Prenatal and Paranatal Factors in the Development of Childhood Behavior Disorders. 
Martha E. Rogers, Abraham M. Lilienfeld, and Benjamin Pasamanick. Baltimore, Mary- 
land: The Johns Hopkins University, School of Hygiene and Public Health, 1955. Pp. 157. 
$3.75. Paper. 


J. YerusHaumy, University of California (Berkeley) 


NTEREST of the statistician in this booklet is primarily in the method of study, which 

utilizes records which were originally collected for purposes other than the investi- 
gation at hand. This method has certain weaknesses (and the authors are well aware 
of them), but it also has much to recommend it. The main strength of the method is 
that the bias which is present in most retrospective type studies is greatly reduced. In 
addition, it provides a relatively inexpensive means for exploratory investigation. 

In the general area of this investigation—the relationship between characteristics 
in the mother during pregnancy and certain defects in the infant—the method as- 
sumes even greater importance. The reason is that the more direct progressive type 
study is not practical because of the difficulty, if not the impossibility, of selecting a 
random sample of pregnant women early enough in their pregnancy. Consequently, 
the retrospective type of study which depends on available records previously re- 
corded was resorted to in this field and utilized with much success. The simple match- 
ing of death and birth certificates, for example, has provided much useful information 
on the association between a number of characteristics in the parents and the prob- 
ability of survival for their offspring. Interesting results were also obtained by Lilien- 
feld and Pasamanick in previous studies on the relationship between abnormalities 
during the prenatal, paranatal and neonatal periods and the later development in the 
child of such disorders as epilepsy, cerebral palsy and mental deficiency. 


’ 
a 


BOOK REVIEWS 125 


On the basis of these findings, the authors postulated a “continuum of reproductive 
casualty,” composed of a lethal component consisting of abortions and neonatal 
deaths and a sub-lethal component consisting of cerebral palsy and epilepsy. In the 
present investigation, the authors test the hypothesis that a similar association exists 
with regard to behavior disorders of childhood, i.e., whether these less serious condi- 
tions also form part of the postulated “continuum.” 

The population studied consisted of Baltimore school children who were referred by 
their teachers for psychiatric guidance because of behavior disorders. Controls were 
drawn from the same classrooms as the cases, choosing the next child alphabetically of 
the same race and sex. The birth certificates for all children, and the hospital records 
for those born in Baltimore hospitals, were secured. 

The method of study is that of comparing the frequency of certain characteristics, 
as determined from the birth certificate and from the hospital record, in the study 
and control groups. The major findings are that certain abnormalities of the prenatal 
and paranatal periods occurred with greater frequency among cases than among con- 
trols. The greatest difference was in the frequency of toxemia of pregnancy and to a 
lesser and not significant degree in bleeding. Of the disorders in the children, the 
hyperactive, confused-disorganized behavior types were more closely associated with 
the abnormalities. The authors suggest that such behavior abnormalities may be 
associated with minor brain damage in the prenatal, paranatal and neonatal periods 
and that the “sub-lethal component of the continuum of reproductive casualty might 
be extended to include a portion of abnormal behavior patterns.” 

The selecting of the matched controls was accomplished at the very first step, 
namely, when the 1,151 cases exhibiting behavior problems who were born in Balti- 
more were identified. The major part of the investigation, however, is based on data 
for only a small portion of these children (471), consisting of single births, those born 
in hospitals for whom complete data were available, those whose birth weights were 
known and who had an IQ score of 80 or above. In addition, some 250 controls had to 
be eliminated, and only 902 controls were used for the 1,151 cases. Of these, 359 sur- 
vived the selective procedures and formed the controls for the 471 cases. Although the 
two groups are not significantly different in most characteristics from the originally- 
matched groups of cases and controls, it would, perhaps, have been better if the se- 
lection of the controls had been performed after the decisions regarding the final com- 
position of the study group. This might have necessitated utilizing also the hospital 
records in the process of selecting the controls, and would have entailed considerably 
more effort, but the extra effort might have been worthwhile. This is especially true 
since in some cases the two groups are not as comparable as is indicated. For example, 
54.6% of the white cases were in the lower economic levels and only 42.5% of the 
white controls. Similarly, the distribution of the cases and controls according to pre- 
vious pregnancies is stated by the authors to be similar; however, only 37% of the 
mothers of the white cases and 16% of the mothers of white controls had had no 
previous pregnancies. These differences are significant at the 5% level. A comparison 
of incidence of prematurity in the two groups with the reported incidence for all Balti- 
more children raises a question of which the authors are aware; however, they can 
offer no explanation for the extremely low proportion of prematures among the white 
controls (2.3%). This is especially disturbing in view of the importance attached by 
the authors to the greater incidence of premature birth among the cases. 

It would also, perhaps, have been better if they had used a more sensitive measure 
for previous losses, namely, the rate of previous infant loss rather than the proportion 
of mothers with one or more such losses. The proportion was found to be twice as 
great among cases as among controls, but the difference is not statistically significant. 
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It is likely that had they used the more appropriate rate, the results might have been 
different and more meaningful. It would also have allowed comparison of the two 
groups according to the number of previous losses. 

The main conclusion, and more especially, the implication that “minimal brain 
injury is a possible factor in the behavior disorders” may not be entirely justified by 
the data. The differences found are far from striking. Moreover, the findings on com- 
plications and prematurity do not seem sufficient in themselves for the speculation 
about “minor brain injury.” 

These points and other limitations mentioned by the authors should not detract 
from the importance of the study. The method employed is useful for investigating 
possible etiological factors for a number of conditions in this “continuum of reproduc- 
tive casualty.” It is almost impossible to execute a more direct progressive study in this 
field. The retrospective type of study must therefore be the method of choice. As used 
by the authors, the study design overcomes the most serious objection to retrospective 
studies—the bias resulting from the fact that the known outcome may prejudice the 
remembering of, and the attaching of importance to, past events. It is desirable, 
therefore, to sharpen this tool by exerting more efforts on the problems of sampling 
and analysis. 


Immigrants and Their Children, 1850-1950. LE. P. Hutchinson. Social Science Research 
Council in cooperation with the U. 8. Bureau of the Census. New York: John Wiley & 
Sons, 1956. Pp. xiv, 391. $6.50. 


Petersen, University of Colorade 
7 census enumeration in 1850 was the first to include a question on the respond- 


ent’s country of birth, and twenty years later one was added on the country of 
birth of the respondent’s parents. In every census since then both questions have been 
repeated, and over the century a considerable mass of statistics has accumulated on 
immigrants and their children. It seemed an excellent idea to assemble these scattered 
data into one volume; and a more suitable person could hardly have been chosen to do 
this than Hutchinson, who has combined a distinguished teaching career with the 
supervision of research for the Immigration and Naturalization Service. In spite of 
this promise, however, the work is disappointing, and this is so in several respects. 

The first two chapters present the basic demographic data on the “foreign stock,” 
which is Hutchinson’s useful term for first- and second-generation immigrants taken 
together. This is a disappearing group in the United States. Both foreign-born and 
the second generation are smaller than at any time since the count was first made, and 
they are dying off more rapidly. In 1950, when the median age of the native stock was 
less than 27 years, it was 37 years for the second generation and 56 years for the 
foreign-born. 

The third chapter deals with the geographical distribution of foreign stocks. As one 
might expect, this is generally related to the duration of residence in this country, 
but the usual pattern of gradual dispersal has many exceptions. Sometimes, as with 
the Norwegians, the second generation is more concentrated in the areas of older set- 
tlement than the first. 

The rest of the book—seven out of eleven chapters, or about 200 out of 275 pages— 
is devoted to an analysis of how immigrants and their children have earned their 
living. The 1950 data, based on a 3} per cent sample of the white civilian labor force, 
are presented here for the first time and in full detail; and those for the period 1870 
on are taken over from earlier censuses with only a slight abridgement. 

The relation between ethnic background and occupation is certainly an interesting 
and important subject, but Hutchinson’s concentration on this did not in my opinion 
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yield a comparably significant result, except perhaps to reinforce the point that we 
know very little about it altogether. Already in the preface he warns us that “changes 
from census to census in the occupational classification create troublesome problems 
of comparability,” and this injunction is repeated two dozen times throughout the 
book. For this reason, Hutchinson did not feel it permissible, and rightly so, to con- 
struct a single series for the whole period. His analyses of the occupational distribu- 
tion in 1870, 1880, 1890, and 1900 are given in separate chapters, with the concluding 
remarks in each tentatively linking each census only with the immediately following 
one. This rather laborious procedure tells us hardly more about the trend over the 
whole period than could be learned from the non-statistical account of any good his- 
torian. 

From 1910 to 1950 the classification of occupations was more nearly uniform, and 
for this period Hutchinson makes a direct comparison. Foreign-born males, he con- 
cludes, “have not only participated in the general movement of the labor force toward 
more skilled employment but have on the whole progressed more rapidly.” He does 
not attempt to estimate, however, to what degree the apparent more rapid rise of 
immigrant workers was real, and to what degree it was based rather on the shift in 
their social origin from peasants who had emigrated for economic reasons to politi- 
cally motivated urban professionals and skilled workers. 

Data on the country of origin are probably even less comparable over the century 
during which they have been collected than those on occupations. In two long appen- 
dices, which comprise one of the more valuable portions of the volume, Hutchinson 
has collected the instructions to census enumerators concerning nativity and parent- 
age, together with some of the conclusions that census directors and others have 
drawn concerning the accuracy of these data. Hutchinson’s own conclusion is that 
“one may expect, on the whole, that the tendency is toward an underreporting of 
foreign births and parentage.” It is rather too bad that he felt he could not go much 
beyond this bald statement and compare these data, for example, with immigration 
figures or those on the respondent’s native language. The latter set of data in par- 
ticular, unsatisfactory as they are, would have helped to distinguish among the vari- 
ous national groups that migrated from the former Russian and Austro-Hungarian 
empires and gave various and generally unhelpful replies to the question on their 
country of birth. 

In the body of the monograph, Hutchinson usually accepts the nativity and parent- 
age data as given, without comment. Even when there are gross discrepancies in these 
data, he does not call them to the reader’s attention. Thus, from 1910 to 1920, while 
the number of other foreign-born increased by almost a million, those reported as 
having been born in Germany decreased by about 625,000; and while the number of 
native-born of other foreign parentage increased by more than four million, those 
with German parents decreased by 325,000. Two years after the United States had 
been at war with Germany, many of those with a German background presumably 
preferred not to report this. Such an extreme example illustrates both how important 
this discrepancy may be and that it is probably not usually random. In particular, it 
seems reasonable to suppose that on the average the second generation would re- 
port native parents more often than immigrants themselves would become “native- 
born.” If this is so, then Hutchinson’s many comparisons between the two genera- 
tions would have to be corrected. 

How narrowly Hutchinson has interpreted his task can be indicated by comparing 
his volume with an earlier census monograph on the same subject, Niles Carpenter’s 
Immigrants and Their Children, 1920. Carpenter presented the available data relevant 
to such standard indexes of acculturation as native language, intermarriage and fertil- 
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ity rates, and the pattern of naturalization. His discussion of these matters could now 
be improved on in some respects, and those readers familiar with the earlier work 
might reasonably have expected that Hutchinson would bring it up to date in both 
senses. They might have hoped that, like Carpenter, he would follow his subject be- 
yond the specific census data and include also a summary of more recent interests— 
what we know, for example, about the effect of ethnic background on voting behavior. 
Actually, none of these subjects is mentioned. While it is not part of the reviewer’s 
function to instruct the author on the book he should have written, it is appropriate 
to point out the disparity between the broad title, which was taken over from Car- 
penter, and the much narrower range of subject matter. 

Alternatively, if Hutchinson chose to concentrate on the occupations of the foreign 
stocks, he might well have placed his analysis of these statistics in the context of the 
considerable discussion on this question. The 42-volume Report of the Senate Immi- 
gration Commission of 1907-11 he mentions in passing several times, but he does not 
summarize the point of view of its authors. And neither Hourwich’s Immigration and 
Labor nor any of the other works that directly challenged this point of view is even 
mentioned. Since even the original censuses now often include a discussion of the 
social relevance of the raw data they present, such a volume as this should do no less. 


Prediction Methods in Relation to Borstal Training. Hermann Mannheim and Leslie T. 
Wilkins. London: Her Majesty’s Stationery Office, 1955. Pp. vi, 276. $3.15. 


Raymonp F, Stettro, The Ohio State University 


HIS is a study to test whether “prediction” tables, similar to those developed in 

the United States during the past three decades, would be effective in forecasting 
recidivism in England, specifically among boys released from Borstal institutions. 
This research project is one of several being currently financed by the Home Office 
of the Secretary of State under the Criminal Justice Act of 1948. This first monograph 
in a projected series sets a high standard for those to follow. Among its notable fea- 
tures is an exceptionally thorough account of the history of criminological prediction 
studies in the United States, together with references to similar research studies in 
Germany and Finland. 

The initial sample for this study consisted of 748 boys admitted to Borstal insti- 
tutions during a one-year period, taking every third name from a complete list. 
Twenty-five cases were subsequently excluded from this sample due to such reasons 
as deaths and transfers to other institutions, reducing the sample to 723. There was 
a further loss of three cases due to lack of criterion data on outcome after release, 
leaving a final sample of 720. 

Criterion data on outcome were gathered from the Criminal Record Office, Scot- 
land Yard, and from the after-care files of the Home Office. Sixty life history items 
were gathered for each boy from the Borstal records including such factors as age at 
first offense, number of previous convictions, misdemeanors while in Borstal, average 
duration of jobs held, family background, and school performance. As usual in such 
studies, the research workers were considerably handicapped by incomplete and in- 
adequate records, particularly on pre-admission backgrounds. 

Factors were then related to the outcome criterion in tables showing per cents of 
success and failure for categories of each factor. Discriminating factors were those 
yielding per cents differing significantly from the success and failure rates, 45 and 55 
respectively, for the sample as a whole. Chi Square was used to test significance; these 
values were then converted into coetficients of contingency to make them comparable 
with product moment and biserial coefficients computed where continuous variables 
permitted their use. 
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The general technique used to find a combination of factors having maximal dis- 
criminating power was that of least squares. Regression weights for factors were com- 
pared with arbitrary unit weights for their efficiency in predicting the criterion, the 
advantage lying with the former. Seven factors were found which in combination had 
substantial efficiency. A second set of five factors was found to be more efficient for 
the middle range of score values yielded by the seven factor equation. This differs 
from earlier studies of recidivism in the separation of attributes from variables in 
the solution of regression equations, and subsequently combining them into a single 
score. The procedure involved the use of probits in solving for the attributes, their 
conversion into an artefact variable, then solving for the values of the variables in- 
cluding the artefact. 

The efficiency found for the new predictive tables compares favorably with that 
yielded in previous studies. Methodological innovations presented in this monograph 
deserve the attention of statisticians concerned with the improvement of prediction 
from experience tables. The authors have made a major contribution to the literature 
on prediction of criminal recidivism. 
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of Property Tax Administration in Illinois 
and Michigan. Michigan Governmental 
Studies, No. 33. Ann Arbor: University of 
Michigan, 1956. $3.00. Paper. 

Petersen, William, Editor. American 
Social Patterns. New York: Doubleday and 
Company, Inc., 1956. $0.95; in Canada, 
$1.10. Paper. 

Reitzel, William, Kaplan, Morton A., and 
Coblenz, Constance G. United States 
Foreign Policy 1946-19655. Washington: 
Brookings Institution, 1956. $4.50. 
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Roosa, Robert V. Federal Reserve Opera- 
tions in the Money and Government Securi- 
ties Markets. New York: Federal Reserve 
Bank of New York, Public Information 
Department, 1956. Free. Paper. 

Siegel, Sidney. Nonparametric Statistics 
for the Behavioral Sciences. New York: Mc- 
Book Company, Inc., 1956. 

Stigler, George J. Trends in Employment 
in the Service Industries. National Bureau of 
Economic Research, General Series No. 59. 
Princeton: Princeton University Press, 
1956. $3.75. 

United Nations Educational, Social and 
Cultural Organization. Jnternational Or- 
ganizations in the Social Sctences. New York: 
Columbia University Press, 1956. 50 cents. 
Paper. 

Zetterberg, Hans, L., Editor. Soctology 
in the United States of America. Paris: 
United Nations Educational, Social and 
Cultural Organization, 1956. Paper. 


iNew McGRAW-HILL Books 


INTRODUCTION TO STATISTICAL ANALYSIS 


By W. J. DIXON and FRANK J. MASSEY, both of the University of Cali- 
fornia, Los Angeles. New Second Edition. 488 pages, $6.00 


An excellent revision of one of the most popular of mathematical statistics texts. 
With no calculus prerequisite, it has been adopted in a variety of situations ranging 
from math departments and business administration, to biology and agriculture. It 
presents the basic concepts of statistics in a manner which will show the student 
the generality of the application of the statistical method. Much material has been 
brought up to date, with the latter chapters expanded to include important topics. 


ELEMENTARY STATISTICS 
For Students of Social Science and Business 


rif $ CLAY SPROWLS, University of California, Los Angeles. 422 pages, 
5 


A basic, elementary text for all social science and liberal arts students. It deals 
primarily with the formulation of decisions based upon incomplete information. It 
considers statistics important as inference, not description. Emphasis is on principles 
of inference, the ideas of hypotheses, risks of error, and the evaluation of ps risks 
in terms of the operating characteristics of a statistical test. 


A PRIMER OF SOCIAL STATISTICS 


By SANFORD M. DORNBUSCH, Harvard University, and CALVIN F. 
SCHMID, University of Washington. 264 pages, $4.75 


Written with simplicity and clarity, this up-to-date coverage of basic statistic con- 
cepts and techniques is unique in that it presumes no ma atical knowledge be- 
yond arithmetic. Appropriate mathematical information is introduced at the point 
where it becomes necessary. Emphasis is placed on the nature of statistical reasoning, 
including appropriate explanations of the logic underlying various statistical con- 
cepts and t iques. There are discussions on the derivation, application, and inter- 


pretation of various statistical tools. 


INTRODUCTION TO STATISTICAL REASONING 
By PHILIP J. MCCARTHY, Cornell University. Ready in July 


A thorough and sound elementary text, with the basic elements of statistical reason- 
ing rigorously presented, on the assumption that the student has had little mathe- 
matical training. Illustrative material been drawn primarily from the social 
sciences, After a review of the traditional features of ae statistics, the text 
introduces concepts of confidence interval estimation and is testing pro- 
cedures. A set of exercises follows each chapter. 


Send for copies on approval 


McGRAW-HILL BOOK COMPANY, INC. 


330 WEST 42no STREET, NEW YORK 36, WN, ¥, 
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JUST PUBLISHED 
A complete introduction to... 


Principles of STATISTICAL ANALYSIS 


SAMUEL B. RICHMOND, Columbia University 


A new, eminently teachable textbook, carefully designed 
as an introduction to statistical analysis for students of busi- 
ness and economics. Detailed illustrative material combines 
with the text to present a thorough treatment of the col- 
lection, analysis, and presentation of statistical data. The 
book is organized around the modern concept of statistical 
induction. Mathematical procedures are kept to a minimum, 
and those techniques employed are introduced and explained 
at the point of use. A unique feature is a Glossary of Equa- 
tions, in which each basic equation used in the text is listed, 
located, and briefly explained. Individual problems provide 
maximum coverage of essential principles. 

175 ills., tables; 450 pp. 


THE RONALD PRESS COMPANY « 15 E. 26th St., New York 10 


MATHEMATICIAN 
CHEMICAL ENGINEER 


Recent graduate men needed to program problems of engineering and 
statistical nature for: 


I. B. M. 650 and 705 
Data Processing Machines 


Position requires the ability to analyze various types of problems, to 
develop numerical and logical techniques adaptable to electronic com- 
puters. 

Degree in mathematics or engineering required; experience in numerical 
analysis and programming desirable but not necessary. Salary com- 
mensurate with experience or potential; opportunity for advancement. 
Write giving resume of age, education, present location, and previous 
employment if any. Personal interview will be arranged. 


THE STANDARD OIL CO. 
606 GUILDHALL, CLEVELAND 15, OHIO 


Please mention the Journal of the American Sratisticat Association in writing advertisers 


opportunities for 
mathematicians and statisticians 
at the 


GENERAL ELECTRIC RESEARCH LABORATORY 


The Information Studies Section of the General Electric 
Research Laboratory in Schenectady has several challeng- 
ing opportunities for mathematicians and statisticians 
who are prepared for and interested in pursuing a career 
of fundamental research in such fields as the logic of in- 
formation processing machines, and information theory. 


Prompt attention will be given to inquiries addressed to 
Dr. Lewis Eldred, Research Laboratory, General Electric 
Company, Box 1088, Schenectady, New York. 


Imaginative Quality Control Manager 


NEEDED: QUALIFICATIONS : 


Outstanding young man to 
establish and direct Quality 
Control program for multi- 
division corporation 


FIRM: 
Established food producing and 
distributing firm in initial stages 
of expansion program 
Headquartered in large south- 
ern city 


AMERICAN STATISTICAL ASSN., BOX 200 
1757 K STREET, N.W., WASHINGTON 6, D.C. 


Age 30-35 

College Degree 

Knowledge of statistical prin- 
ciples 

Experience in administering 
quality control programs 

Ability to establish, direct and 
control effective program 


COMPENSATION: 


$10,000 and up depending upon 
individual qualifications plus 


profit sharing bonus 


in writing advertisers 


Please mention the Journal of the A 


announcing for 1957: 


OPERATIONS RESEARCH AND ELECTRONIC 
COMPUTERS FOR MANAGEMENT 


R. Clay Sprowls and James R. Jackson 
both of U.C.L.A. 


This soundly organized text serves to introduce operations research and 
electronic computers to students of management. Working on a how-to- 
do-it basis, it presents operations research as a scientific approach to 
decision making. The workings and uses of electronic computers are 
thoroughly covered in the last half of the book. Principles rather than 
specific computers are emphasized. There is enough material for a com- 
plete course in either subject. 


Ready in Summer 


and. a 1956 success: 


INTRODUCTION TO STATISTICS 


Frederick C. Mills 
Columbia University 


Tailored to fit the needs of one-semester courses, this abridgement of 
Professor Mills’ highly respected STATISTICAL METHODS presents 
concepts and procedures for beginning students, Emphasis on the concept 
of statistical inference and on practical applications unifies all topics and 
adds greatly to the teachability of the text. 
“The abundance of material, the prolific use of illustrations, the precise- 
ness of definitions and concepts, and the lucidity of presentation make 
this book a welcome addition. . .” 

Wayne B. Moeller, Texas A. & M. College 


HENRY HOLT AND COMPANY 
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INTRODUCTION TO MODERN STATISTICS 
With Application to Business and Economics 


by Werner Z. Hirsch, Washington University 


Presupposing no mathematical knowledge beyond ele- 
mentary algebra, this basic text for students of business 
and economics introduces the student to modern infer- 
ence methods and gives him a sound understanding of 
the tools and ideas of statistics, particularly as they 
apply to decision-making. 


Comprehensive and stimulating, this text... 


. introduces each problem by a concrete example 
drawn from actual experience and shows what 
methods are applicable and how they can help 
solve the problem 


. presents modern statistical concepts and their 
uses in an intelligently satisfying manner and, at 
the same time, enlivens the subject with actual 
business examples and pertinent, often humorous, 
anecdotes and illustrations 


. bases the material upon the probability approach 
and includes such modern topics as managerial 
and quality control, electronic computers and 
sampling and bias 


. Offers numerous interesting classroom-tested ex- 
ercises 
Ready late Spring 1957 


EXPERIMENTAL DESIGN: 
THEORY AND APPLICATION 
by Walter T. Federer, Cornell University 


For all courses in the design and analysis of experiments 
1955 591 pages $11.00 


The Macmillan 
60 FIFTH AVENUE, NEW YORK 11, N.Y. 
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e UNIVERSITY 
PRESS 


The Measurement and Behavior of Unemployment 


A CONFERENCE OF THE UNIVERSITIES-NATIONAL BUREAU COMMITTEE FOR 
ECONOMIC RESEARCH 


Here at last is a comprehensive exploration of the abundant statistics on 
unemployment. In eleven papers, contributors discuss the meaning and 
measurement of unemployment and full employment and the behavior 
of unemployment in the United States, nine other Western nations, and 
the USSR. Also included are comments on the papers by experts in the 
field and an Introduction which puts the volume in perspective. Published 
for the National Bureau of Economic Research. 600 pages. Charts. $7.50 


Problems of Capital Formation 
Concepts, Measurement, and Controlling Factors 
By THE CONFERENCE ON RESEARCH IN INCOME AND WEALTH 


The fifteen ee here, by Bert Hickman, Ruth Mack, Franco Modi- 
gliani, Robert Eisner, and others, add much to our understanding of 
private capital formation and of its major components—plant and 
equipment, residential construction, and changes in inventories. Studies 
in Income and Wealth, Volume Nineteen. Published for the National 
Bureau of Economic Research. 


Typewritten, offset. 608 pages. Charts. $7.50 


Trends in Employment in the Service Industries 
By GEORGE J. STIGLER 


The author presents and analyzes employment trends since 1870 in the 
service industries, in which he includes trade, finance, real estate, the 
abe prow domestic and personal service, government, and other re- 

ed occupations. Professor Stigler combines scholarship of a high order 
with a graceful style in showing that these service industries have had 
a substantially unbroken growth relative to the total labor force in the 
United States. Published for the National Bureau of Economic Research. 


180 pages. Charts. $3.75 


Order from your bookstore, or 
PRINCETON UNIVERSITY PRESS 


Princeton, New Jersey 
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MARKETING RESEARCH 


While it is our business to provide 
tabulating facilities that utilize all the 
latest equipment for speed, accuracy 
and economy, we believe that our 
responsibility to research directors 
does not start with these electro- 
mechanical services. 

Long experience in working with 
publishers, agencies and research de- 
partments has demonstrated to us 
that the quality of the finished reports 
depends in large measure on the 
thinking and planning that go into 
the project from its inception. 

That’s why STATISTICAL makes pro- 


STATISTICAL 


TABULATING CORPORATION 


Established 1933 - Michael R. Notaro, President 


TABULATING + CALCULATING + TYPING 
TEMPORARY OFFICE PERSONNEL 


fessional help available to you, right 
from the start. In resolving your 
ideas. In translating sound thinking 
into all-inclusive questionnaires. And 
then backing up this all-important 
preliminary assistance with respon- 
sible help in the mechanical and 
analytical phases of your project. 

Just a few minutes with one of our 
research specialists will show you 
what this complete service can mean 
to you in producing the quality 
analyses and interpretations required 
in today’s competitive markets. 


Just phone our nearest office. 


CHICAGO 

53 West Jackson, HArrison 7-4500 
NEW YORK 

80 Broad Street, WHitehall 3-8383 
ST. LOUIS 

411 N. Tenth St., MAin 1-7777 


NEWARK 


CLEVELAND 
1367 E. 6th St., SUperior 1-8101 
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National-Newark Bldg., MArhet 3-7636 


TABULATIONS 


QUALIFICATIONS DESIRED: 


mathematical .»» MS or PhD or equivalent 
...Good knowledge of applied mathematics 


The Missile and Ordnance and numerical analysis 


Systems Department of General 


Electric Company, prime con- E ALARY 
tractor for the ICBM and IRBM UBERAL $ 
nose cones, has a position open BROAD BENEFITS PROGRA * 


on its professional staff for a 
Mathematical Analyst. Position 
responsibilities include: (1) liaison 
and consultation with engi- 


EXCELLENT FACILITIES ANG EQUIPMENT 
(IBM 650 & IBM 704) 


neering personnel to determine Send resume in confidence to: 
and define engineering and 
scientific problems for solution Mr. John Watt, Room 555-5 
by computer mathematics, (2) MISSILE AND ORDNANCE SYSTEMS DEPT. 


translation of such problems into 
mathematical form suitable for 
lution by digital 
of computer programmers. 


3198 Chestnut Street, Philadelphia 4, Pa. 


AMERICAN ECONOMIC REVIEW 


Votume XLVI DeceMBER 1956 Numser 5 
ARTICLES 

A Macroeconomic Theory of Wages .. Sidney Weintraub 
Fiscal Policy in the ’Thirties: A Reappraisal ......... errr .-E, C. Brown 
A Rehabilitation of Export Subsidies ............seeeeeee+e0++-0. Wemelsfelder 
A Theory of the Low-Level Equilibrium Trap .............-+...+..+R. R. Nelson 
Theory of the Reluctant Duelist ......-...eeeeeeeeeeeeeeeeeee++Daniel Elisherg 
Malthus on Money Wages and Welfare .........+.eseeeeeeeeee++.W. D, Grampp 


REVIEW ARTICLES 


The Soviet Textbook on Economics Surhnyi-Unger 
Patinkin’s Monetary and Value Theory ..... 660000604600 ++eee++ William Fellner 
Survey of Ceylon’s Consumer Finances 


Review of Books, Titles of New Books, Periodicals, Notes 


The AMERICAN ECONOMIC REVIEW, a quarterly, is the official publication of the 
American Economic Association and is sent to all members. The annual dues are six 
dollars. Address editorial communications to Dr. Bernard F. Haley, Editor, AMERI- 
CAN ECONOMIC REVIEW, Stanford University, Room 220, Stanford, California; 
for information concerning other publications and activities of the Association, com- 
municate with the Secretary-Treasurer, Dr. James Washington Bell, American Eco 
— Seutatian, Northwestern University, Evanston, ois. Send for informa- 
on booklet. 
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ESTADISTICA 


Journal of the 
Inter American Statistical Institute 


Vol. XIV, No. 52 CONTENTS September 1956 


ARTICLES 


Aplicaciones Estadisticas en las Ciencias Fisicas en los Estados Unidos (traduc- 
cién) William R. Pabst, Jr. 


Algunos Errores en la Declaracién de Edad en los Censos de Poblacién de 1950 
en Centro América y Méxi Jorge Arias B. 


Aproximacién de la Tasa Anual Promedio de Cambio Jacob 8. Siegel 
Sources, Procedures of Compilation, and Types of Current Industrial Statistics in 


Dominion Bureau of Statistics 
Programacién del Desarrollo 


cina de Estadistica de las Naciones Unidas 
Collection of Mental Disease Statistics in the United States 
Problemas en la meeppownsl del Muestreo en Encuestas Agropecuarias en Ia América 
Latina Oficina de Tjo Estadistica de la FAO 


Problemas Encontrados en "Estudios de Gastos de la Familia Hechos Recientemente 
en Paises Latinoamericanos Pauline B. Paro 


Special Features. Legal Provisions. International Resolutions relating to Statistics. 
Institute Affairs. Statistical News. Publications. 


Published quarterly Annual subscription price $3.00 (U.S.) 


Inter American Statistical Institute 
Pan American Union, Washington 6, D.C. 


POPULATION STUDIES 


A JOURNAL OF DEMOGRAPHY 
Edited by D. V. GLASS and E. GREBENIK 


Volume 10, No. 2 November, 1956 
CONTENTS 
T. van Heek: Roman-Catholicism and Fertility in the Netherlands: The 


Demographic Influence of a Remarkable Minority Position. 
K. L. Gillion: The Sources of Indian Emigration to Fiji. 


I. M. Cumpston: A Survey i | ms Immigration to British Tropical 
Colonies to 19 


R. Mansell Prothero: = Po Popsiation. Consus of Northern Nigeria 1952: Prob- 


H. 8. Halevi: in Israel. 
C. A. L. Myburgh: Estimating the Fertility and Mortality of African Popu- 


lations. 
Book Reviews. 
List of Books and Publications Received. 


Subscription price per volume of 3 parts 35/—net, post free (or American cur- 
rency $5.75). 


Single parts 15/—each plus postage (American $2.50, post free). 


Published by the Population Inv tion Consmatians at the London 


School of Economics and Poli Science, 15 Houghton Street, 
London W.C.2. 
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THE JOURNAL OF MARKETING 


Volume XXI JANUARY, 1957 Number 3 


Biography on Daniel Starch on Biographies 
The Changing Role of the Marketing Function ...........+..+.. Henry Bund and James W. Carroll 
Agricultural Marketing and the Italian Economy ........... Giuseppe Orlando 
The Relationship Between Income and Retail Sales in Local Areas .......Vera Kilduff Russell 
A Note on Marshall and Selling Costs ... 
Store Merchandise Publicity on the Women’s Pages seeeeeess Steven J. Shaw 
What Is a Pictorial Projective Technique? Richard L. Lysaker and Joseph E. Bradley 
How Does Business Regard University Advertising Courses? .............-+-Frank Dunbaugh 
On the Reliability of Mail Questionnaires in Product Tests . cosences 
and Manuel N. Manfield 
ia S, Penn, jr., Editor 
Legislative and Judicial William F. Brown, Editor 
Book Reviews ...... Edwin H. Lewis, Editor 
THE JOURNAL OF MARKETING is published quarterly in January, April, July and October 


at 73 Main Street, Brattleboro, Vermont. It is the official publication of the American Marketing 
. 27 East M Street, Chicago 3, Illinois. 


The annual subseription rate of THE JOURNAL OF MARKETING is $6 domestic; $7 foreign; 
postage prepaid. The single copy rate is $1.75 domestic; $2 foreign; postage prepaid. 


ECONOMETRICA 
Journal of the Econometric Society 


Vol. 25, No. 1—January, 1957 
CONTENTS 
ARTHUR SMITHIES: Bomeente Fluctuations and Growth 
TJALLING C. KOOPMANS AND MARTIN BECKMANN: Assignment Problems and the Location 
of Economie Activision 
R. L. BASMANN: A Generalized Classical Method of Linear Estimation of Coefficients in a 
Structural 
a KOWITZ AND ALAN S. MANNE: On the Solution of Discrete Programming 


Problems 

H. THEIL: Linear Aggregation in Input- on Analysis 

G. STUVEL: A New Index Number Formu 

A. CHARNES AND W. W. COOPER: Noalinear Power of Adjacent Extreme Point Methods in 
Linear Programming 

W. J. BERGER AND EDWARD SAIBEL: Power Series Inversion of the Leontief Matrix 

H. UZAWA: Note on the Rational Selection of Decision Functions 

S. FUJINO: A Theory of Econemic Fluctuations in a Capitalist E y—E ics of Cycles 
and Growth, by Michio Morishima (A Review Article) 

BOOK REVIEWS 

Flow of Funds in the United States 1939-1953 (Board of Governors of the Federal Reserve System). 
Review by Gerhard Colm 

Rank Correlation Methods (M. G. Kendall). Review by 

Politicheska’a ekonomi’a: Uchebnik (Political E. y of Sci of the 

USSR, Economic Institute). Review by Eberhard Fels 

A(ndrei) A(ndreevich) Markov, Isbrannye trudy: teoria chisel, teoria wero’atnostei a Markov’s 
Selected Works on the Theory ef Numbers and the Theory of Probability) (Yu. V. Linnik, 
ed.). Review by Eberhard Fels 

Kurs teorii vero'atnostei (A Course in the Theory of Probability) (B. V. Gnedenko). Review by 
Eberhard Fels 

Income of the American People (Herman P. Miller). Review by | ~~ Ruggles 

La “‘Cowles Commission" ed i modelli i (Ad Predetti). Review by R. S. Eckaus 

The Theory of Economic Growth (W. Arthur Lewis). Review by k. K. Kurihara 

Interindustry Economic Studies: A Comprehensive Bibliography on Interindustry Research (Vera 
Riley and Robert Loring Allen). Review by R. N. Grosse 
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BIOMETRICS 


Journal of the Biometric Society 


Vol. 12, No. 4 CONTENTS December 1956 


The Statistical Analysis of a Complex Experiment Involving Unintentional 
Restraints (D. J. Finney and F. W. Cope) ; Simplified Analysis of Singly Linked 
Blocks (K. R. Nair); A Rankit Analysis of Paired Comparisons for Measuring 
the Effect of Sprays on Flavor (C. I. Bliss, M. L. Greenwood and E. S. White) ; 
Theoretical and Experimental Study of Self Fertilized Populations (T. W. 
Horner and C. R. Weber); The Discrimination of Interactions and Linkage in 
Continuous Variation (Birger Opsahl) ; Applications of the k, statistic to Genetic 
Variance Component Analysis (D. S. Robson); Note on Wald’s Method of 
Fitting a Straight Line When Both Variables are Subject to Error (E. S. Keep- 
ing) ; Recent Advances in Biometry in Japan (M. Masuyama and M. Hatamura) ; 
Control of Errors in Surveys (Morris H. Hansen and Joseph Steinberg) ; The 
Study of Physiological Effects of Hot Climates (J. O. Irwin) ; Confidence Limits 
for Measuring the Precision of Bioassays (C. I. Bliss). 


Biometrics is published quarterly. Its objects are to describe and exemplify 
the use of mathematical and statistical methods in biological and related sci- 
ences, in a form assimilable by experimenters. Members of the American Sta- 
tistical Association may subscribe through the Association at the rate of $4 
yearly. The annual non-member subscription rate is $7. Inquiries, orders for 
back issues and non-member subscriptions should be addressed to: 


BIOMETRICS 
National Research Council 
Ottawa 2, Canada 
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The Annals of Mathematical Statistics 


THE OFFICIAL JOURNAL OF THE INSTITUTE OF 
MATHEMATICAL STATISTICS 


Vol. 28, No. 1—March, 1957 


CONTENTS 
A Theory of Some Multiple Decision Problems, I ...........2. L. Lehmann 
Estimates for Global Central Limit Theorems .........Ralph Palmer Agnew 


of Variance Components: II. The 


ane cuts 


Statistical Decision ose 
ees A. Girshick, 8. Karlin, and H. L. Royden 


able Linear Statistics Laha 


On the Estimation of sana in ~~ Series 


Approximations to the Power 
Some Uses of Quasi-Ranges ........... 
Modified Randomization Tests for Nonparametric Hypotheses ....Meyer Dwase 


On Certain Two-sample Nonparametric Tests for hie ~~ ere 
Balkrishna V. Sukhatme 


Multi-factor Designs for Explorin, Surfaces ....... 
and J. S. Hunter 


Consistency of Certain Two-sample Tests ...... J. R. Blum and Lionel Weiss 
A Note on Truncation and Sufficient Statistics Walter L. Smith 
A Central Limit Theorem for Multilinear Stochastic Processes . .Emanuel Parzen 
On the Enumeration of Decision Patterns a = Means 


Percentiles of the Wa Statistics .......... oe 

On the Serial Test for Random Sequences ........... éeqeniaw 

On the Specification Error in Regression Analysis ......H. Wold and P. Fazér 

Sets of Measures Not Admitting Necessity and Sufficient eNEE 4 or Sub 
fields . S. Piteher 


“A Note on Combined Interblock and Intrablock ‘atination 


News and Notices 
Publications Received 


Address orders for subscriptions and back numbers to Professor George E. 
Nicholson, Jr., 8 tary, Institute of Mathematical Statistics, Department of 
Statistics, University of North Carolina, Chapel Hill, North Carolina. 
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Do you belong in IBM Applied Science? 
This Consulting Job is now open! 


Before his recent promotion, this man was 
an IBM Applied Science representative, 
working out of the New York office. Dur- 
ing the past two years, he has shown in- 
numerable customers how to do things 
electronically. For example, an aircraft 
manufacturer wanted to experiment with 
a radically different design for a nuclear 
reactor. Months of toil with mathematical 
equations were indicated. But this Applied 
Science Consultant was able to map out a 
program that saved the manufacturer over 
100 days of pencil-chewing arithmetic. 
Later, for this same company, he organized 
the establishment of computer systems for 
aircraft performance predictions and for 
wing stress analysis. 


Could you fill it? 


Rewarding careers are now open to 
men with degrees in: 

Mathematics ¢ Physics 

Engineering Chemistry 

© Statistics © Economics 


IBM Applied Science has quad- 
rupled its staff during the past three 
years. In 1956, over 70 promotions 
were conferred. Doesn't this growth 
factor alone suggest more room for 
your abilities—more professional 
stature? 


Why not act today? Write, out- 
lining the details of your back- 
ground and interests, to: 


P. H. Bradley 

Dept. 10703 

IBM Corporation 

590 Madison Avenue 

New York 22, N. Y. 
Be sure to visit the IBM booth at the 
1, R. E. Show, March 18 through 21. 


Responsibilities: 


© Advise customers and prospects in re- 
gard to the scientific and technical appli- 
cations of IBM electronic equipment. 


® Analyze customer's technical problems 
in terms of machines and their applica- 
tions for the mutual benefit of the cus- 
tomer and IBM. 


® Deliver talks about the computing field 
—supported by demonstrations—to cus- 
tomers, prospects, scientific groups, and 
IBM personnel. 


®@ Maintain constant and close contact 
with the customer’s top management and 
associated IBM executive. 


Continually analyze customer applica- 
tions and develop new machine uses. 


1BM maintains approximately 100 Applied 
Science offices throughout the length and breadth 
of the United States. You may request assign- 
ment in the location that is most desirable to you. 


INTERNATIONAL 
BUSINESS MACHINES 
CORPORATION 


DATA PROCESSING 
ELECTRIC TYPEWRITERS 
TIME EQUIPMENT 
MILITARY PRODUCTS 
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OPERATIONS 
RESEARCH 


Operations research is a fast growing and practical 

science attracting some of the best brains in the 

country. Its future is unlimited. If you want to join 

a group of pioneers in this exciting field, we invite 

you to investigate the openings on our staff. 

On our part we offer: 

1. A record of experience in operations research, 
out-distanced by perhaps no other organization. 

2. A scrupulously maintained professional ap- 
proach and atmosphere. 

3. The team approach to problem solving. On each 
team are representatives of varied disciplines — 
sometimes three, occasionally as many as a dozen. 
Fully equipped digital and analog computing 
facilities. 

5. ORO occupies several buildings in Chevy Chase, 
Maryland, one of America’s most attractive 
suburbs. Pleasant homes and apartments in all 
price ranges are available. Schools are excellent. 
Downtown Washington, D. C. with its many 
cultural and recreational advantages is but a 
20-minute drive. 

Favorably competitive salaries and benefits, ex- 
tensive educational programs, unexcelled leave 
policy. 


FOR DETAILED INFORMATION, WRITE: 
Dr. L. F. Hanson 


OPERATIONS RESEARCH OFFICE 


[oro| The Johns Hopkins University 


7100 CONNECTICUT AVENVE 
CHEVY CHASE 18, MARYLAND 
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